Thomas Talk — AAC Spelling Assistant · Continuity / Handoff

Thomas Talk — AAC Spelling Assistant · Continuity / Handoff

This document brings a fresh Claude Code session up to speed on the talk app. Read it fully before making changes.

What this app is

A single-page web app that acts as an AAC (Augmentative and Alternative Communication) aid for a Duchenne muscular dystrophy patient who has temporarily lost the use of his voice. The patient spells out messages one letter at a time; an operator (or the patient) drives a simple two-step selection. It is a real tool for a real person — correctness, reliability, and not-breaking matter more than cleverness.

Users: a small, easily-trained set of people. One patient, a few helpers. Niceties can be sacrificed for reliability; obscure power-features (e.g. the #test panel) are fine because users can be told about them.

How the app works (interaction model)

Three screens, one at a time, inside a single <main class="stage">. The app is tablet/phone only — there are no keyboard hotkeys (removed in v17 to simplify; the operator interacts via tap):

  1. Pick screen — three coloured cards: Vowel, Consonant, Pain / Discomfort. Below them, a Next word button inserts a space into the message. No heading; the cards are self-explanatory.
  2. Cycle screen — the chosen category auto-advances one item at a time in huge font at a user-set speed (default 1.5 s/item, persisted per device). Tapping anywhere on the stage stops the stream and opens the review. The wide hit-area is deliberate — the centred letter was a small target that operators missed routinely. The dock (message box, speed, toggles) sits outside the stage so its controls are not affected.
  3. Review screen — shows the stopped item plus the two shown immediately before it (newest-first, wraps around the set — e.g. stopping on D gives D, C, B; on B gives B, A, Z). The Next button rotates the highlight through the three; Commit adds the highlighted item to the message and returns to Pick. Tapping a review tile commits that item immediately. Resume returns to the stream from the same position it stopped at — used to undo an inadvertent tap without losing the operator’s place. Back abandons the category entirely.

The message box at the bottom is a display-only <div> (NOT a textarea — see “Gotchas”). Backspace and Clear buttons edit it.

All three categories use the streaming model. (Vowels were briefly a static all-vowels screen; that was reverted so the operator never has to read items aloud — the stream + speech does it.)

For vowels and consonants the cycled item is a single uppercase letter. A dock checkbox Common consonants first (off by default) switches the consonant stream from A–Z to English frequency order (CONSONANTS_FREQ: T N S H R D L C M W F G Y P B V K J X Q Z). Vowels and the Pain/Discomfort category are unaffected. The toggle is off by default because the patient may prefer alphabetical even though frequency order is faster on average; the change takes effect on the next consonant cycle (an in-progress stream is not re-ordered).

The Pain/Discomfort category is a two-level hierarchy:

  • L1 (activeMode === "body") streams the 5-item BODY_PARTS_L1 list: hips, arms, knees, legs, head. Ordered most-common-first for a bed-bound DMD patient. Committing a body part does not write anything to the message — instead, it transitions to L2 for that part.
  • L2 (activeMode === "adjustment") streams the adjustment phrases for the L1-chosen part, stored in ADJUSTMENTS[part]. Each phrase is written in the operator’s voice (the operator is the one running the app and saying the words aloud), so phrases drop the patient’s “my” — e.g. "raise head", "left arm in", "hips toward me". Every L2 list opens with a "[part] hurts" phrase so a simple pain report is a one-stop selection inside the part.
  • Both levels share the rose --pain accent and “body” CSS class — the L2 cycle screen and review tiles render exactly like L1, just with longer phrases. The mode tag reads PAIN / DISCOMFORT at L1 and PAIN / DISCOMFORT: <part> at L2.
  • L2 commit pushes the phrase to spoken as a single chunk (smart leading space if the message doesn’t already end with one) and returns to Pick. Backspace removes the whole phrase in one tap.
  • L2 “Back” returns to Pick (not L1) — simpler, and the operator can re-tap Pain/Discomfort to restart the L1 cycle. L2 “Resume” continues the L2 stream from where it stopped.

The L1/L2 entry path is unified through enterCycle(mode, items); startCategory(mode) is the L1 entry, startAdjustmentL2(part) is the L2 entry, and reviewConfirm() dispatches between “commit phrase” and “transition to L2” based on activeMode.

Current state

  • Version: v21. APP_VERSION in index.html and CACHE_VERSION in sw.js must always be bumped together on every release.
  • Hosted as a static site at https://codelahoma.github.io/talk/ (the codelahoma.github.io GitHub Pages repo, in a talk/ subdirectory). Not linked from any other page — direct URL only.
  • Fully offline-capable: fonts are embedded as base64; a service worker caches the app shell.
  • Tested working in Chrome for iOS and Safari on iPhone and iPad.

Files in this directory

  • index.html — the entire app: HTML + CSS + JS in one file (~1268 lines). All logic lives in one IIFE in the single <script> block.
  • sw.js — service worker. Cache-first, purges old caches on activate.
  • manifest.webmanifest — PWA manifest for “Add to Home Screen”.
  • icon.svg, icon-180.png, icon-512.png — app icons (dark tile, “A”).
  • CONTINUITY.md — this file.

Architecture notes

  • Single self-contained file. No build step, no external network dependencies at runtime. Fonts (Lora for display, Poppins for UI) are embedded as base64 @font-face rules. This was a deliberate choice so the app works fully offline once cached.
  • One adaptive layout, not separate phone/tablet builds. A @media (max-width: 620px) block handles phones (portrait is the primary phone case); a max-height: 460px landscape block handles sideways phones. Designed/tested at iPad dimensions first.
  • State machine: state is one of STATE.PICK / CYCLE / REVIEW. show() toggles the .hidden class on the three <section>s.
  • Speech uses the Web Speech API (speechSynthesis). It is ON by default (as of v16): the patient is bed-bound and the screen is operator-facing, so the audio channel is the patient’s only window into app state. The dock checkbox Speak each item controls ambient per-item speech (cycle stream, review highlight, commit confirmation) and persists. The Speak button in the dock plays the whole composed message and intentionally ignores that toggle — explicitly tapping Speak always plays.
  • Audible commit confirmation (as of v21): reviewConfirm() calls speak(item) after pushing the committed item to the message, so the patient hears what landed. Without this they only heard the highlight speak in review and then silence on commit, with no way to confirm the right thing was selected.

Key JS functions (in index.html’s IIFE)

  • startCategory(mode) — L1 entry for "vowel", "consonant", or "body". Routes through enterCycle(mode, items), which wires the state machine and DOM once for both L1 and L2.
  • startAdjustmentL2(part) — L2 entry for the Pain/Discomfort hierarchy; pulls ADJUSTMENTS[part] and re-uses enterCycle("adjustment", items).
  • setCategoryTag(mode) — paints the mode tag for either level; for "adjustment" it reads PAIN / DISCOMFORT: <activePart> and styles with the same body (rose) class.
  • paintCycleLetter() / scheduleNext() / stopCycle() — the auto-advance.
  • onCycleInterrupt() — builds the 3-item review pool, opens review.
  • resumeCycle() — re-enters the cycle from the same position it stopped at; re-paints (and re-speaks) the current item once before timing resumes. Wired to the Resume button.
  • openReview() / paintReview() / reviewNext() / reviewConfirm() — the review screen. reviewConfirm() dispatches between “transition to L2” (when activeMode === "body") and “push to message” otherwise.
  • makeTickHandler(i) — per-tile tap handler (tap commits immediately).
  • speak(text) / letterToSpeech(ch) / unlockSpeech() — per-item speech output.
  • speakMessage() — speaks the whole composed message via a single utterance (lower-cased so engines that treat all-caps as initialisms still read spelled words as words). Wired to the dock Speak button. Bypasses speechOn; the button explicitly says “play this”.
  • applySpeed() + sliderToSeconds() / secondsToSlider() — speed control.
  • buildTestPanel() / openTestPanel() / speakRaw() — the #test panel.

Speech — IMPORTANT, the hardest-won part

iOS speech has two traps that cost several iterations to solve:

  1. A bare uppercase letter is read as “Capital A”. Never pass a lone uppercase letter to speechSynthesis.
  2. The bare word “ay” is read as “aye” (long-I). Creative single-syllable spellings get misread.

The fix is LETTER_SAY — a map of each letter to a device-verified phonetic spelling. These spellings were tested by ear on the actual iOS device via the #test panel. DO NOT change LETTER_SAY without re-verifying through the #test panel. Current verified map:

A:"ayy" B:"bee" C:"see" D:"dee" E:"ee"  F:"eff" G:"jee" H:"aitch" I:"eye"
J:"jay" K:"kay" L:"ell" M:"emm" N:"enn" O:"ohh" P:"pee" Q:"cue"   R:"are"
S:"ess" T:"tee" U:"yoo" V:"vee" W:"double yoo" X:"eks" Y:"why"    Z:"zee"

Speech is also gated by an iOS unlock: speechSynthesis only works after a genuine user gesture. unlockSpeech() runs on the first pointer/touch/key event (and on toggling speech on), and every speak() calls speechSynthesis.resume() first because Safari aggressively pauses the engine. Do not remove this machinery.

Speech fires: on each letter as the stream advances; on the highlighted review letter; once when a review tile is tapped. It does NOT fire again on Enter-commit (the letter was just spoken on the highlight).

The #test panel

A hidden speech-diagnostics panel. Open it by loading .../talk/#test. It lists all 26 letters with candidate spellings; tapping a candidate speaks it verbatim (via speakRaw(), bypassing LETTER_SAY) so a human can pick what sounds right on the device. The chosen spellings render as a copyable line. This is the tool to use whenever LETTER_SAY needs re-tuning.

Release / deploy ritual

  1. Make changes in index.html / sw.js.
  2. Bump BOTH APP_VERSION (index.html) and CACHE_VERSION (sw.js) to the next number, e.g. v11 → v12 and talk-v11 → talk-v12. They must stay equal.
  3. Commit and push to the codelahoma.github.io repo (in talk/).
  4. On the device, fully close and reopen the app — the service worker picks up the new sw.js on next launch (sometimes the launch after that).
  5. Confirm the version number in the upper-right corner of the app shows the new number. If it still shows the old one, the service worker hasn’t swapped yet — close/reopen again, or clear site data.

The version tag is fixed at top-right (it was moved there from the dock because the browser tab bar clipped the bottom of the screen).

Gotchas / hard-won lessons (don’t re-introduce these bugs)

  • Message box must be a <div>, not a <textarea>. A readonly textarea on iOS does not reliably repaint when its value is changed programmatically — Clear appeared to leave stale text. The div uses textContent; an empty string yields a truly empty element so the :empty CSS placeholder shows.
  • Tap-to-stop is on .stage and gated by both state === STATE.CYCLE AND !e.target.closest('button, [role="button"]'). Without the target check, a click on a picker card (which transitions state to CYCLE inside its own handler) bubbles up to .stage, sees the just-changed state, and instantly stops the stream it started — same trap for the Resume button. The check is the well-known fix for this kind of state-transition bubble race. The dock sits outside .stage so its controls are unaffected; review-screen buttons fire while state === STATE.REVIEW, which the state gate filters out.
  • Review tiles need an aria-label (“the letter B, select”) or VoiceOver announces them as “Capital B”. The big display letters are aria-hidden.
  • The speed slider runs slow→fast left-to-right. Internally speedSeconds is the real delay (default 1.5 s/item as of v17); the slider value is its inverse via sliderToSeconds/secondsToSlider. The + stepper makes it FASTER (fewer seconds). Don’t “simplify” this by making the slider value the seconds.
  • The mode-tag is hidden on phones (the @media blocks) because it’s absolutely positioned and collided with content on small screens.

Persistence

The composed message, speed setting, speech-on toggle, and frequency-order toggle are all persisted to localStorage under the key talk:state as a single JSON blob. loadState() reads on init (before defaults apply) and saveState() is called from every mutation point: renderMessage(), applySpeed(), and the two toggle change handlers. Storage failures (private browsing, quota) are swallowed silently — persistence is non-critical. The dock’s Clear button persists an empty message, so clearing then reloading gives a truly empty box.

The blob carries a version field (STORAGE_VERSION, currently 1). At load time, missing version means the saved state pre-dates the schema and certain stale defaults can be migrated — currently used to clear a pre-v1 speech: false so the new speech-on default applies to users who never explicitly toggled the old default-off.

Possible next steps / open items

  • Verify on a real phone whether iOS Safari’s bottom toolbar overlaps the dock in portrait. If so, add env(safe-area-inset-bottom) padding to .dock. (Not yet done — flagged but unconfirmed.)
  • Speech rate is fixed at 0.9. Could be made adjustable if needed.
  • Recent-messages list (tap to re-speak common messages without re-composing). Builds on the v21 speakMessage() infrastructure.
  • Quick-replies category (yes/no/thank you/call my wife/etc.) — would fit alongside Pain/Discomfort as another picker card or as a one-tap row above the message box.

Environment / preferences for the developer (Rod)

  • This is a static HTML/CSS/JS app — there is currently no Python, no build, no test suite. Rod’s usual stack (pytest/black/flake8/mypy, pre-commit) does NOT apply to this file as it stands.
  • If the app ever grows into a served service rather than a static GitHub Pages drop, Rod’s pattern is small FastAPI microservices with both REST and MCP interfaces, slotted into a homelab service registry (bookmarks-style apps live around port 8020). That would be the point to introduce cookiecutter scaffolding and pre-commit hooks. Until then, keep it simple: one HTML file.