Alert triage with AI: cutting noise without cutting signal.

Transaction monitoring runs on rules. Rules produce alerts. Alerts swamp analysts. The number quoted across the industry is that 95% of alerts close as false positives, and the human cost of that ratio is enormous. AI looks attractive as a way to compress the noise, but the failure mode is also obvious: a model that closes alerts too aggressively buries a real one. This post explains how we use the model to triage alerts at Veetso without giving it the authority to close them.

Key facts

01: Industry baseline. Roughly 95% of transaction-monitoring alerts close as false positives. The cost is analyst hours and missed real alerts as concentration drifts away from the top of the queue.
02: What we optimise. Two ratios: first-read precision (real alerts among those the analyst opens first) and tail recall (share of filed SARs the model ranked in the bottom half, must stay near zero).
03: Model never closes alerts. The model reorders the queue and adds context. The analyst still reviews every alert the rule engine generates; the close / escalate / SAR decision is the analyst's.
04: Calibration per alert type. Structuring alerts are rare-but-real; high-volume cash alerts are common-but-benign. Calibrating globally would push rare alerts down the queue, so each type is calibrated independently.

What we are optimising for

The honest target is not "fewer alerts". That is easy and dangerous. The target is higher signal-to-noise on the alerts the analyst reads first, with no reduction in the number of real alerts they see at all.

We measure two ratios continuously:

First-read precision. Of the alerts the analyst opens first (the ones the model surfaces as highest priority), what share turn out to be real concerns?
Tail recall. Of the alerts that ended up being filed as suspicious activity reports, how many did the model rank in the bottom half? This is the model's miss rate, and it has to stay near zero.

A model that improves first-read precision while keeping tail recall flat is helping. A model that improves precision by pushing real alerts down the queue is hurting.

How the model fits in

The model sits between the rule engine and the analyst, not on either end. The rule engine still generates every alert it would have generated before. The analyst still reviews every alert the rule engine generates. What changes is the order in which the analyst sees them, and the context attached to each one.

Ordering

The model produces a priority score for each alert based on factors the rule engine cannot weigh: relationship to other recent alerts on the same account, narrative coherence of the underlying transactions, comparison to the customer's own history, comparison to the population of similar customers.

The highest-scoring alerts surface first. The lowest-scoring ones still surface, just later in the queue. None are hidden.

Context

For each alert, the model produces a short narrative the analyst reads before opening the underlying transactions: "Customer typically receives salary on the 25th; this transaction is on the 23rd from a different counterparty in a different country." The narrative speeds up the analyst's first read without making the decision for them.

Linking

The model finds related alerts on related accounts that a rule engine would not connect. When a beneficial owner of one business is also a director of another, alerts on both businesses are shown together. The analyst sees the pattern, not just the slice.

Where the model must stop

The model does not close alerts. It does not file SARs. It does not lower customer risk ratings. Those are the analyst's decisions, recorded under the analyst's identity in the audit log.

We hold this line for the same reasons as KYC review: accountability, tail-risk asymmetry, and drift. A model that closes alerts is faster; a system that closes them with no human between the model and the audit log is not allowed to exist inside Veetso.

Calibrating the model against alert types

Different alert types have different base rates. A "structuring" alert is rare and almost always real; a "high-volume cash" alert is common and usually benign. The model has to know these base rates so it does not push the rare-but-real alerts down the queue.

We calibrate the model per alert type rather than globally. Each type has its own precision and recall numbers; the model's ranking inside the type is what matters, not its ranking across types. The analyst's queue is then balanced by the rule engine, which mixes types deliberately so no one type dominates the first read.

What the audit log captures

Every alert, every model score, every analyst decision, and every escalation flows into the audit log. The log entry for each alert contains:

The rule that fired.
The model score and the features that drove it.
The analyst who reviewed it.
The disposition (close, escalate, file SAR).
The time-to-decision.

The regulator can sample any disposition and reconstruct the analyst's reasoning, including the model context the analyst saw. The chain is unbroken, exactly the same way it is for Brain answers.

Where we have seen this go wrong elsewhere

The failure mode at most banks that bring AI into TM is the same: the model is given closure authority on low-scored alerts. Precision goes up, throughput goes up, and management is happy. Then a real alert closes silently, the next quarter brings an enforcement action, and the model is rolled back to triage-only with the band-aid of "human in the loop". This is the wrong order to learn it in.

We start with triage-only. If the data later supports auto-closure on a narrow alert type with strong tail recall, we can argue that case in the steering committee. We will not put it in place by default.

FAQ

Questions readers ask

Does AI close transaction-monitoring alerts at Veetso?

No. The model triages (it reorders the analyst's queue and adds context to each alert) but the close, escalate, or file-SAR decision is always the analyst's. We hold this line for accountability (the regulator wants a named human), tail-risk asymmetry (a model that is right 99% of the time on alerts that matter most is worse than a cautious human), and drift (model behaviour shifts as fraud typologies do).

How does AI reduce false positives in AML?

It does not, in the sense that the rule engine still generates the same alerts. What changes is the order in which the analyst sees them: highest-scoring alerts surface first, lower-scoring ones surface later, none are hidden. First-read precision rises because the analyst opens the most likely real alerts first.

What is first-read precision in alert triage?

The share of alerts the analyst opens first (those the model ranked highest) that turn out to be real concerns. Rising first-read precision means the analyst's early time is spent on the alerts that matter. We measure it continuously alongside tail recall to ensure the model is not gaming precision by pushing real alerts down the queue.

What is tail recall and why does it matter?

The share of alerts that ended up being filed as SARs that the model ranked in the bottom half of the queue. This is the model's miss rate, and it must stay near zero. A model that improves first-read precision while harming tail recall is making the analyst's queue look better while burying the alerts that matter most.

How is the model audited?

Every alert, model score, analyst decision, and escalation flows into the same audit log we use for the rest of the AI surface. The regulator can sample any disposition and reconstruct the analyst's reasoning, including the model context the analyst saw. The chain is unbroken end to end.