Product & accessibility

Voice features: read-aloud and dictation in Autistic Mirror

Autistic Mirror can now read responses aloud and accept voice input. Both are optional, can be turned off at any time, and do not depend on any external voice service. This article explains the neurological background, the technical setup, and why the voices currently still sound synthetic.

Why a voice feature reduces neurological load

Typing is not neutral. It couples three systems: fine motor control, visual feedback on the screen, and language production. In autistic processing, none of these run silently in the background. Each one consumes measurable executive resources. People who are already sensory-loaded have less budget left for them.

Reading is similar. Long blocks of on-screen text demand saccade control, contrast processing, and attention direction at the same time. On a day with high sensory-filter load, with migraine vulnerability, with POTS-related eye fatigue, or after a long masking day, exactly those resources are scarce. Hearing a response read aloud separates the content from the visual processing cost.

Dictating instead of typing reduces the motor and executive load of formulating. The thought is spoken once, instead of being structured in parallel with the act of writing. For many autistic and AuDHD people, this is the difference between a question that gets asked and a question that stays in their head.

Who notices the difference

The voice feature is not a comfort add-on. It lowers the entry barrier for several groups who often stay silent in text-only tools.

AuDHD processing: dictation bypasses the inertia block in front of an empty input field.
Co-occurring dyslexia or dysgraphia: typing carries a disproportionate cost.
Eye fatigue on POTS, migraine, fatigue or post-viral days: on-screen reading becomes painful.
Phases with high masking load: when the language-translation system is already overloaded, an additional modality helps.
Motor co-occurring conditions like Ehlers-Danlos: keyboard sessions are physically limited.

What the feature actually does

Voice input uses the browser's Web Speech API. The spoken audio stays on the device or is processed by the browser's own speech recognition, depending on the browser vendor. Before first use, a plain-language consent dialog explains what happens where. Privacy details are documented in the privacy notice.

Read-aloud uses the browser's Speech Synthesis API. No external voice service is contacted, no additional server is involved, no recording is created. The voice itself comes from the operating system on the device.

Both features are optional. The microphone button only appears when the browser supports voice input. Read-aloud can be started and stopped per response. There is no auto-play. Anyone who does not want to use the feature sees no difference from the classic text input.

Why the voices do not sound professional yet

The read-aloud responses sound synthetic on most devices. On some systems the voice is flat, on others mechanical, on others usable. This is not a bug. It is a deliberate choice with a clear background.

Professional AI voice synthesis (voice cloning at the level of ElevenLabs, OpenAI Voice, or Google WaveNet) produces voices that are barely distinguishable from human recordings. At realistic usage it costs a mid to high three-digit euro amount per month, plus ongoing per-second usage fees. For a solo-funded, credit-based project that is not currently affordable without a noticeable increase in the price per response.

The alternative would be to release the voice feature only when it sounds professional. That strategy would have meant that people with reading or typing barriers would have continued working without this feature, possibly for months. A present, plainly synthetic voice is usable. A missing voice is not usable.

As soon as credit revenue covers the running costs of a professional solution, the voice will be replaced without any change to the feature scope.

Privacy

Voice input is handled through the browser interface. No audio recordings are created on Autistic Mirror's servers. Once confirmed, the recognised text is treated like any normal chat message and is subject to the same security and deletion rules as any other input. The browser vendors (Apple, Google, Mozilla, Microsoft) process the audio data according to their own rules. For Chromium-based browsers, this typically happens server-side at the vendor. The consent dialog before first use states this in plain language.

Read-aloud creates no data that leaves the device. The voice engine runs locally on the device.

A bright spot

Accessibility rarely arrives in one big launch. It arrives when features are released as soon as they are usable, not only when they are polished to enterprise level. The voice feature is a concrete example: rough, honestly labelled, switchable at any time, with no new data flows, no pressure to use it. Anyone who needs it has it from today. Anyone who does not need it sees nothing change.

Autistic Mirror explains autistic neurology individually, related to your situation. For yourself, as a parent, or as a professional.