Stochastic Solutions

Errors of Process

It’s all human error
— Walt Patterson

In TDDA’s taxonomy of errors, Errors of Process are the most mundane and easiest to tackle. Errors of interpretation during the development of an analytical process (Errors of formulation) occur because understanding a problem domain and methods is hard. Errors of implementation occur because writing correct software is famously difficult. Errors of applicability are subtle, because it is not always clear what assumptions have been made, or how two contexts differ. And Errors of interpretation when the process is used (errors of communication) occur because results can be subtle and hard to explain and it can be hard to know what needs to be explained. In contrast, errors of process are often just silly mistakes born of fatigue, boredom, confusion, unclear instructions, confusing naming conventions etc.

While there is no “silver bullet” for eliminating errors of process, two tools and a plethora of best practices stand out both in analytical settings and more broadly. The tools are checklists and automation (including monitoring).

Checklists

Atul Gawande’s remarkable book The Checklist Manifesto documents countless cases in which a good checklist has been shown to reduce failure rates dramatically. We are all familiar with pilots running through checklists systematically before every flight, but Gawande, a surgeon, starts by talking about the development of a 5-point checklist for inserting a central line, a procedure carried out frequently in every hospital around the world. Historically, between ten and twenty per cent of all central lines resulted in infections, and a study at Johns Hopkins showed that if the WHO Gold-standard procedure, which has five steps, is reliably followed that rate drops essentially to zero, i.e. the insertion procedure is almost completely reliable and does not result in infections when carried out correctly.

The TDDA library and command-line utilities provide software tools for tackling several of the major classes of errors TDDA identifies. In particular, reference testing helps prevent and identify errors of implementation, and data validation with constraints can be used to monitor pipelines, which helps to counter both errors of applicability and errors of implementation.

The other major categories of errors are harder to tackle with software, so the TDDA includes 22 checklists, available as part of the TDDA Book and online in various formats. These cover all parts of the analytical cycle.

The most relevant checklists are:

Also highly relevant are:

Automation and Monitoring

Automation is the other key to reducing errors of process. Humans, to different degrees, are prone to boredom, fatigue, day-dreaming, running on autopilot, misreading things and so forth whereas computers are exceptionally good at performing repetitive tasks while suffering none of those failure modes (at least, when not running large-language models). A careful, well-monitored automation can lock in a correct process and reduce error rates dramatically, taking advantage of the things computers are best at.

While automation is undoubtedly a key to reducing errors, a poor automation, or one that is not robust, can cause havoc on a scale no human can, because blindly executing the wrong instructions at speed and scale is a recipe for disaster. Robust, safe automation requires careful monitoring and alerting, with systems that pause or disable themselves when monitoring suggests things are going wrong. A critical and subtle component of this is establishing different severity levels that establish the right balance between automatic shutdown, interruptive alerting, logging, and periodic review so that operators do not become desensitised by a constant stream of undifferentiated problems, most of which are minor.

Other Key Practices

In addition to these two central pillars of process improvement, there are other specific practices that help to avoid errors of process. Key in data science are:

The culture of an organization is also critical for reducing, detecting, and correcting errors of process. A “no-blame” culture makes it much more likely that staff will report, highlight, and work to fix errors of process and their causes than is the case in a more finger-pointing, blame-attributing organization.

Errors of Process are discussed at length in Chapter 16 of the TDDA Book.

Company number SC329851. Registered office: 16 Summerside Street, Edinburgh, EH6 4NU.
Copyright © Stochastic Solutions Limited 2007–2026.