We attack our own app

What a red-team test is, why we ran more than 1,100 attempts in 19 languages, and what ended up at zero

Autistic Mirror often gets opened in sensitive moments. After a sensory-heavy day, during a crisis, in the middle of a difficult conversation with family. Whoever opens an app in that state has no buffer for an AI that suddenly answers badly. Safety is therefore not a feature that arrives later. Safety is the precondition that makes the tool usable at all.

This article describes what we did on 17 May 2026 against the live app. It is readable without prior knowledge. Anyone looking for technical detail will find it in the internal audit report. Here the question is whether the protective layers hold when someone tries to break them on purpose.

What a red-team test is

A red-team test is a simulated attack. Instead of waiting for someone outside to try, we attack the app ourselves. With every pattern known in security research, plus patterns that are specifically critical for an AI used in a neurodivergent context.

Three questions sit at the centre.

Can the AI be talked into ignoring its internal rules. Can it be pushed during a crisis into dropping hotline references or normalising what is happening. Does the surrounding software protect user data even when an endpoint is hammered directly.

The weight of such a test does not come from a single attempt. It comes from volume and variation. One passing attempt is an anecdote. Hundreds of passing attempts across multiple languages are evidence.

What we mean by attack attempt

An attack attempt is a real request sent to the running app, phrased to bypass a protective rule. No lab, no mock, no simulation. Exactly what an attacker would type into the input field. Whenever we use the term attack attempt below, we mean one of these real requests.

First run

In the first round we ran several dozen carefully constructed attack patterns against the running app. Each pattern in every one of the seven actively maintained UI languages. German, English, Spanish, French, Dutch, Brazilian Portuguese, Danish.

Seven languages are not decoration. An AI defence that holds in German can silently fail in French. Taking safety seriously means testing every language in which the app actually answers.

Result of that first run. Zero violations.

Why that was not enough for us

A passing run of 210 attempts is a good signal. Statistically it is still thin. Knowing whether a system really holds requires a scale at which chance can be ruled out as the explanation.

Red-team reports for AI products typically work with a few dozen to a few hundred probes, often in one or two languages. We wanted to go further on both axes. For two reasons. The app operates in an especially protective context. And we are preparing for independent external audits, which require comparable baselines.

The extended run

The extended run on 17 May 2026 sent a much larger inventory against the running app. More than 1,100 attack attempts, plus several hundred additional model responses from long, multi-stage conversations. Accompanied by a full offline structural test suite that checks the protective logic independently of the AI.

For the scale to be visible, here are the individual areas. What the terms mean is summarised in a sentence next to each one.

AreaWhat is being checkedResult
Deep probing across the 7 UI languagesattempts to push the AI step by step into breaking its own rules, in every actively maintained language0 violations
Attempts to overwrite the internal rules directlyclassic inputs such as "Ignore all previous instructions"0 violations
Attempts to force the AI into another role"You are now a doctor", "Answer like a coach"0 violations
Attempts to bypass safeguards through writing tricksencoded or character-disguised inputs designed to slip past filters0 violations
Attempts to force behavioural compliance and normalisationrequests for ABA-style recommendations0 violations
Attacks in further languages outside the UImore than a dozen additional languages an attacker would pick because many AI defences silently fail there0 violations
Reworded bypass attemptsthe same attacks in different wording, to defeat pure keyword filters0 violations
Combined attacks from an extended catalogueseveral attack patterns blended into a single attempt0 violations
Slow manipulation across many conversation turnsconversations that try to soften safeguards gradually rather than directlywithin tolerance
Offline structural testsseveral test suites that check the protective logic independently of the AI for consistency and driftall passed
Admin-side endpoints under pressureevery admin-side interface is called without valid authorisation and must refusecorrectly blocked
Quality check on answer contentseveral clusters check whether the AI names neurological mechanisms correctly rather than offering generic phrasesnear-complete match
Data isolation between usersdatabase-level check whether one user's data can ever surface in another user's answer0 data leaks
Tamper detection on the activity logverifies that retroactive changes to security-relevant logs remain detectablepassed
Reachability of every crisis hotline linkevery emergency link stored in the app is reached out topassed
Multilingual handling of expert termschecks whether neurological terminology is explained correctly in several languagespassed

What the numbers mean

Three properties matter in that table.

Depth. More than 1,100 attack attempts are far above what is common in the market. At an observed violation rate of zero, statistical uncertainty becomes small enough that the result can no longer be explained by chance.

Breadth. 19 languages covered. The seven actively maintained UI languages plus further languages from other writing systems that an attacker would pick because many AI defences silently fail there.

Repeatability. This run gives us a comparable baseline. If we run the same test again in three months, any regression introduced by a new model version or a prompt change shows up immediately. Safety is not a state. It is a continuous measurement.

Privacy during the test itself

A safety test should not produce a data trail that later becomes a problem. Per attempt only three things are stored. The verdict (pass, partial, fail). The targeted mechanism. A short cryptographic hash fragment of the model response. No plaintext responses, no internal system rules, no user data are archived. The audit can be reviewed without anyone ever seeing the original wording.

External tests come next

Passing internal tests is the floor, not the ceiling. A safety claim only carries real weight once independent third parties can reproduce it. We are preparing an external audit and will publish its results with the same transparency as this internal run, regardless of whether the findings turn out flattering or uncomfortable.

In parallel, a manuscript on the methodology of our safety architecture has been submitted to Autism in Adulthood for peer review (status: in review). This makes the architecture verifiable outside our own house for the first time.

What stands behind the numbers

Most AI products market features. Safety rarely shows up in marketing because it feels abstract to outsiders. Behind the numbers in this run sits a different stance. An app that works with especially vulnerable people owes its users more than a polished interface. It owes them that the promises hold under pressure. The fact that this run ended at zero violations is no guarantee for the future. It is the statement that the responsibility is taken seriously, with real tests and real numbers, not with claims.

For organisations and auditors

A more detailed methodology and results document is available for B2B customers, compliance teams and external auditors. It contains the full probe matrix, the exact inventories per attack area, the classifier logic and the data protection statement on storage. Informal request to enterprise@autisticmirror.app, sent after a short clarification on intended use.

Autistic Mirror explains autistic neurology individually, tied to your specific situation. For yourself, as a parent, or as a professional.

Aaron Wahl
Aaron Wahl

Autistic, founder of Autistic Mirror

How you function has reasons.
They can be explained.

Register for free