Speed Reader AI Docs
Back to App Ref: OpenAI Docs

Speed Reader AI — Overview

Speed Reader AI helps you read, comprehend, and retain information faster. It combines a visual rapid serial visual presentation (RSVP) reader, high‑quality text‑to‑speech, a Teleprompt mode for performance reading, and built‑in AI tools for summarization, quizzing, and structured study workflows.

💡
Use the left sidebar to jump between topics. Type in the search box and press Enter to jump to the best matching section (e.g., “slider”, “WPM”, “quiz”, “teleprompt cadence”).

Quickstart

  1. Open the app and paste text, upload a file, or extract from a URL.
  2. Select Visual, Read‑to‑Me, or AI Features from the Mode selector.
  3. Press Start (Visual) or Play (TTS) to begin. Adjust speed, chunking, and ORP color as needed.
  4. Use Export to save progress or Export as MD to produce a Markdown study sheet.

Supported inputs

  • Paste text directly
  • Upload .txt, .pdf, .epub, .docx
  • Extract text from a URL
  • Capture text with OCR camera

Key outputs

  • RSVP visual reading with stats
  • Natural TTS audio with speed control
  • AI summaries, custom prompts, quizzes
  • Markdown export and audio packs

Controls reference

  • Speed slider (WPM): Sets the display rate for RSVP in Visual mode. Typical comfortable range: 250–450 WPM. Advanced readers: 600–800+ WPM. Increase gradually to maintain comprehension.
  • Words per chunk: Number of words shown per flash (1–5). 1–3 improves fixation and acuity at high WPM; 4–5 increases contextual coherence at lower WPM.
  • ORP color: Optimal Recognition Point highlight color. Increase contrast for better fixation at higher speeds.
  • TTS speed slider: Playback rate for Read‑to‑Me (0.25×–4.0×). For dense math/philosophy, use 0.75–1.25×; for lighter prose, 1.25–2.0×.

Reading modes & cadence

RSVP minimizes saccades (eye jumps) by keeping the text in a stable locus and controlling temporal cadence. Comprehension depends not only on WPM but on cadence shaping—how pauses and bursts map to syntax.

  • Temporal chunking: Short function words tolerate higher cadence; content words benefit from micro‑pauses for integration.
  • Punctuation heuristics (guidance): Pause longer after sentence terminators (., !, ?), medium after semicolons/colons, brief after commas/parentheticals. While the app streams at uniform WPM, you can simulate cadence effects by lowering WPM for dense passages and using sentence navigation to replay.
  • Chunk size vs. WPM: At high WPM, use smaller chunks (1–2) to reduce masking; at lower WPM, larger chunks (3–5) restore prosodic feel.
  • Fixation stability: Maintain consistent ORP color/contrast to anchor micro‑saccades and reduce regressions.
Advanced practice: cycle 90–120 seconds at target WPM, then 30–45 seconds recovery at −20% WPM to consolidate gist without fatigue.

Visual speed reading

The visual reader displays one to five words at a time at a chosen WPM. Adjust chunk size, set the ORP color, and navigate sentence‑by‑sentence. A progress bar and live stats track words read, total words, time elapsed, and current WPM.

  • Controls: Start, Pause, Reset, Fullscreen, Teleprompt mode
  • Sentence navigation: Repeat current sentence, advance to next
  • Stats: Words Read, Total Words, Time Elapsed, Current WPM

Teleprompt mode

Teleprompt presents the text as a large, smooth‑scrolling prompt for talks, presentations, or performance reading. While RSVP optimizes for speed at a fixed gaze, Teleprompt optimizes for delivery—breath, emphasis, and audience contact.

Purpose

  • Rehearsal: Practice scripts with consistent pacing and visual clarity.
  • Live delivery: Present with minimal cognitive load so attention can shift to tone and audience.

Cadence and phrasing

  • Cadence: Aim for 150–180 WPM conversationally; 120–140 WPM for technical/novice audiences. Use sentence boundaries and clause commas as micro‑pause anchors.
  • Phrasing: Break at semantic units (noun phrases, verb phrases) to maintain coherence. Avoid mid‑unit breaks that fragment meaning.
  • Breath: Plan inhale points every 10–15 seconds; longer lines demand earlier breaths. Balance respiratory cadence with rhetorical emphasis.
  • Prosody: Vary pitch and intensity at clause ends to signal structure. Technical density benefits from descending contours; persuasion can leverage rising contours on claims.

Typography and visibility

  • Use large high‑contrast fonts and generous line height for peripheral readability.
  • Keep line length 45–70 characters to reduce return‑sweep errors.
Workflow: draft → summarize with AI for talk outline → convert to Teleprompt → rehearse at 130–160 WPM → export notes or key terms.

Read‑to‑Me (TTS)

Generate high‑quality speech from your text using selectable voices and adjustable playback speed. The player supports play/pause/stop, sentence navigation, and track downloads.

  • Enter an API key for your selected provider
  • Choose a voice and speed (0.25× – 4.0×)
  • Style prompt: guide tone, pacing, and pronunciation
  • Export: Full Track and Audio Pack (zip)

AI features (state‑of‑the‑art study tooling)

Speed Reader AI integrates with multiple providers to synthesize, structure, and test knowledge.

Summarize Text

  • Abstractive focus: Produces concise, sectioned summaries capturing gist, claims, and caveats.
  • Structure: Uses headings, bullet hierarchies, and key takeaways for rapid review.

Custom Summary

  • Accepts style instructions (e.g., “for executives”, “Feynman explanation”, “compare X vs Y”).
  • Useful frameworks: SCQA, Claim‑Evidence‑Reasoning, Problem‑Approach‑Result‑Implications.

Quiz Generator

  • Active recall: Generates questions to retrieve knowledge, improving retention.
  • Cognitive coverage: Mix factual, conceptual, procedural, and conditional questions (Bloom’s taxonomy from Remember→Evaluate).
  • Calibration: Start with open‑ended prompts; add distractor choices to assess discrimination.
Configure provider, model, and API key in the AI Configuration panel within the app.

Document inputs

  • Paste text directly into the editor with line numbers
  • Upload supported files: .txt, .pdf, .epub, .docx
  • Extract from a URL via the URL form
  • OCR camera for scanned pages and photos

OCR camera

Built on Tesseract.js, the OCR camera captures text from images with adjustable levels to improve recognition. Use it to ingest printed material or whiteboard notes quickly.

Key terms

Double‑click words to collect them into your Key Terms panel. Print the list or clear it as needed. For best retention, convert terms into Q/A pairs (define, compare, apply) and interleave across sessions.

Chapter selection

For long documents, select chapters to process. Choose all or only the sections you want, then process the selected subset to focus your reading.

Export / Import

  • Export progress to JSON; import later to resume
  • Export as Markdown with headings, summaries, and notes
  • Audio Pack export/import for TTS assets

AI providers & models

Speed Reader AI supports multiple providers. The app populates model lists dynamically from configuration. Below is a professionally curated overview of the options present in your configuration, including strengths, ideal use cases, and a pricing snapshot.

Pricing evolves quickly and may differ by account, region, or via aggregators like OpenRouter. Treat the costs below as a qualitative guide: “Free” means no per‑token API charge (e.g., OpenRouter free tier or local inference). “Provider‑priced” means billed by the provider; follow the link to confirm current rates.

OpenAI

Frontier multimodal models with strong instruction‑following, tool use, and reasoning. Choose smaller variants for lower latency and cost.

Model Profile & Benefits Pricing
gpt-5 Frontier general model for complex reasoning, synthesis, and agentic workflows. Best for highest‑quality summaries and nuanced analysis. Standard per 1M tokens — Input $1.25, Cached $0.125, Output $10.00. Source: OpenAI
gpt-5-chat Chat‑optimized variant of GPT‑5 tuned for dialogue, safety, and helpfulness. Ideal for interactive study assistants and Q/A. Standard per 1M tokens — Input $1.25, Cached $0.125, Output $10.00. Source: OpenAI
gpt-5-mini Latency/throughput‑oriented. Balanced quality for everyday summarization and quiz generation at lower cost. Standard per 1M tokens — Input $0.25, Cached $0.025, Output $2.00. Source: OpenAI
gpt-5-nano Cost‑efficient micro‑model for fast drafts, boilerplate, or light classification. Use when volume matters more than marginal quality. Standard per 1M tokens — Input $0.05, Cached $0.005, Output $0.40. Source: OpenAI
gpt-4.1-mini Well‑known value model: strong instruction following at low latency. Great for summaries and quizzes with predictable behavior. Standard per 1M tokens — Input $0.40, Cached $0.10, Output $1.60. Source: OpenAI
gpt-4o-mini Multimodal‑capable “omni” mini; excellent price/performance for text tasks. Recommended default for summaries/quiz when cost sensitive. Standard per 1M tokens — Input $0.15, Cached $0.075, Output $0.60. Source: OpenAI
gpt-4.1-nano Ultra‑light footprint for high‑volume requests with basic accuracy. Use for simple extractions and scaffolding. Standard per 1M tokens — Input $0.10, Cached $0.025, Output $0.40. Source: OpenAI

Anthropic (Claude)

Strong at harmlessness and long‑form analysis with high factuality. Good for structured study materials and safer content generation.

Model Profile & Benefits Pricing
claude-3-5-haiku-latest Fast, low‑cost. Solid quality for summarization and short quizzes. Ideal for large batch workloads. Per 1M tokens — Input $0.80, Output $4.00; Prompt caching: Write $1.00, Read $0.08. Source: Anthropic
claude-3-7-sonnet-latest Balanced flagship with excellent instruction following and helpful tone. Great for study guides and explanations. Per 1M tokens — Input $3.00, Output $15.00 (typical Sonnet tier). Source: Anthropic
claude-opus-4-20250514 Top‑tier reasoning and nuanced synthesis. Best for complex, high‑stakes content. Per 1M tokens — Input $15.00, Output $75.00; Prompt caching: Write $18.75, Read $1.50. Source: Anthropic
claude-sonnet-4-20250514 Updated Sonnet (Claude 4 family). Strong overall quality at lower cost than Opus. Per 1M tokens — ≤200K: Input $3.00, Output $15.00; >200K: Input $6.00, Output $22.50; Caching ≤200K: Write $3.75, Read $0.30. Source: Anthropic

DeepSeek

Competitive performance with a reasoner variant for chain‑of‑thought and tool‑use heavy tasks.

Model Profile & Benefits Pricing
deepseek-chat General chat model with good price/perf. Suitable for summaries and Q/A. Standard per 1M tokens — Input (cache hit) $0.07, Input (cache miss) $0.27, Output $1.10. Off‑peak (UTC 16:30–00:30): hit $0.035, miss $0.135, output $0.550. Source: DeepSeek
deepseek-reasoner Reasoning‑forward variant; better for multi‑step solutions and rubric‑aligned quizzes. Standard per 1M tokens — Input (cache hit) $0.14, Input (cache miss) $0.55, Output $2.19. Off‑peak discounts apply as documented. Source: DeepSeek

OpenRouter (aggregator)

Access many providers/models through a single API. Your config includes the following free or general routes:

Route Profile & Benefits Pricing
openai/gpt-oss-20b:free Open‑source 20B class model via OpenRouter. Good for baseline summaries and lightweight tasks. Free via OpenRouter
google/gemini-2.0-flash-exp:free Gemini Flash experimental route. Fast responses; excellent for quick summaries and outline generation. Free via OpenRouter
moonshotai/kimi-k2:free Kimi K2 route for general chat and synthesis. Free via OpenRouter
tencent/hunyuan-a13b-instruct:free Instruction‑tuned 13B model; efficient for simple tasks and bulk processing. Free via OpenRouter
qwen/qwen3-235b-a22b-07-25:free Large Qwen variant via free route; strong general capabilities for summaries and Q/A. Free via OpenRouter
openai/gpt-oss-120b 120B‑class OSS route. Higher quality than 20B; better long‑form synthesis. Provider‑priced via OpenRouter (non‑free). See OpenRouter pricing.

Ollama (local)

Run local models with zero per‑token API cost and maximum data privacy. Performance depends on your CPU/GPU and chosen model size.

  • Cost: $0 per API call (local compute only).
  • Best when offline, cost‑sensitive, or requiring data residency.
Pricing snapshot as of 2025-08-15 (per 1M tokens, USD). Sources: OpenAI, Anthropic, DeepSeek, OpenRouter. Verify current rates before production. Tiers (Flex/Standard/Priority), prompt caching, and time‑of‑day discounts (DeepSeek) affect effective cost.
// In-app steps:
// 1) Open AI Configuration
// 2) Select Provider (e.g., OpenAI)
// 3) Choose Model from the populated list
// 4) Paste API Key (if required)
// 5) Use Summarize / Custom Summary / Quiz in AI Features

Voices & speed

The TTS engine offers multiple voice options and playback speed control. Style prompt lets you specify tone, pacing, and pronunciation hints (e.g., acronyms, names).

Markdown exporter

Export your study materials in Markdown. The exporter structures headings and preserves lists for easy review in any Markdown viewer or notes app.

Readwise

Readwise integration helps consolidate your highlights and notes. Use the Readwise panel to sync summaries or key insights from your reading sessions.


Performance & accuracy

  • Use chunking 1–3 words for maximum recognition at high WPM
  • Increase ORP contrast to maintain fixation
  • For OCR, ensure sharp images, good lighting, and straight alignment
  • Alternate sprint and recovery blocks to maintain comprehension at higher speeds

Shortcuts

  • ESC: exit fullscreen
  • Space: pause/resume (visual)
  • Arrow keys (where applicable): sentence navigation

Troubleshooting

  • If TTS won’t start, verify provider, model, and a valid API key
  • For PDFs that fail to parse, try OCR camera or export to .txt
  • If sidebar highlighting desyncs, refresh once to reset anchors

Privacy

API keys are entered locally in the app. Review your provider’s data retention policy. Exported files remain on your device unless you choose to share them.

Changelog (high‑level)

  • Visual reader with stats, sentence nav, fullscreen
  • Read‑to‑Me with voices, speed, and style prompts
  • AI features: summarize, custom summaries, quizzes
  • OCR camera with adjustable levels
  • Key terms, teleprompt mode, chapter selection
  • Markdown and audio pack export/import