Testability Is a Design Property, Not a Testing Strategy

In April 2026, the ThoughtWorks Technology Radar Volume 34 did something unusual for a report that typically tracks new tools and frameworks: it highlighted testability as one of the engineering disciplines most worth returning to. Not a new testing framework. Not a novel approach to QA. Testability itself: the property that determines whether code can be verified in isolation, with confidence, without heroic setup. The reason it warranted calling out is the same reason engineering fundamentals have a way of reasserting themselves: teams are generating more code than ever, discovering it is harder to test than ever, and diagnosing the problem in the wrong place.

The standard response to a struggling test suite is to add more tests, invest in better tooling, or assign testing more seriously to the QA function. Each of these can help at the margins. None of them address why the tests are hard to write in the first place. That question has an architectural answer.

Where the diagnosis goes wrong

Test maintenance cost is a lagging indicator. By the time a team notices that their test suite is brittle, expensive to extend, or running for twelve minutes on a codebase that shipped nothing significant this sprint, the structural decisions that caused it are months old and merged into thousands of files. The symptoms land in the test layer. The causes live in the design.

A component that cannot be tested in isolation without standing up a database, a message broker, and two external APIs is not suffering from a lack of tests. It is suffering from a dependency structure that prevents isolation. A service that produces different outputs given the same inputs based on the current time of day is not a testing problem to solve with mocking libraries. It is a determinism problem to solve by making time a parameter. A module that cannot be exercised without triggering an email to a real customer is not missing test doubles. It is missing a boundary between business logic and side-effect execution.

Each of these failures is architectural. Each one was established, intentionally or not, before the first test was written.

Barry Boehm’s work on software development economics put a number on the gradient: a defect caught in the design phase costs roughly 30 times less to fix than the same defect found in production, a finding replicated consistently since the 1980s. The figure gets cited most often in discussions about QA investment cycles. It applies equally to testability decisions, because the cost curve is the same: every design choice that reduces testability introduces a defect before any test can catch it, and the remediation cost compounds with every change made since.

Four properties that determine testability before a test is written

Testability is not evenly distributed across a codebase, and it is not binary. It is the aggregate of four specific architectural properties, each evaluable at design time and each with a characteristic failure mode.

Dependency transparency is whether a component’s collaborators are declared explicitly and can be substituted, or are constructed and acquired internally. When a class instantiates its own database connection, calls a global service locator, or reads credentials from the environment inside a business method, the dependency is hidden. A test cannot reach in and replace it with a lightweight stand-in. Transparent dependencies arrive through the constructor or are passed as method arguments, which means a test can supply a fake and verify behavior without production infrastructure.

Side-effect isolation is whether observable changes to the world outside the component (persisting to a database, publishing to a queue, sending a notification) are concentrated in a thin layer that can be verified separately from the logic that decides whether those changes should happen. When business logic and side-effect execution are interleaved, a test of the business logic triggers the side effects. Isolation separates the decision from the execution and makes each verifiable on its own terms.

Deterministic state is whether the component produces the same output given the same input across runs, or whether its behavior depends on external state it does not control: the current time, a random seed, a feature flag read at runtime, a counter in a shared object. Non-determinism is not always avoidable, but it can almost always be pushed to the boundary. Making time a parameter, accepting a random source from outside, and passing feature-flag values as arguments rather than reading them inline are all design moves that restore determinism to the core logic while keeping the non-determinism visible and manageable.

Seam availability is Michael Feathers’ term for the places in a system where behavior can vary without editing the source code. In object-oriented systems, a seam exists at every interface boundary where a different implementation can be supplied at construction. In functional systems, a seam exists wherever a function is passed as a parameter. When code has no seams, tests cannot vary the component’s behavior without modifying its source. Working with a legacy codebase almost always means finding seams that exist but have not been exposed, or introducing seams where none exist.

Three anti-patterns with characteristic symptoms

Three design choices account for most of the testability problems in production codebases. Each has a symptom that appears in the test suite before it appears in any architecture review.

Hidden construction is the practice of instantiating dependencies inside the component rather than accepting them as parameters. The symptom in tests is long setup code, heavy use of static interceptors, or tests that must modify global state before they can run. Every line of test setup that exists to work around a hidden dependency is setup that would not exist if the dependency were declared. The fix is not a more powerful mock framework: it is moving dependency acquisition to the constructor or call site.

// Hidden construction: the service decides where to persist, making tests
// dependent on the real database or requiring static interception of the
// PostgresOrderRepository constructor.
public class OrderService {
    private final OrderRepository repo = new PostgresOrderRepository(
        System.getenv("DB_URL"), System.getenv("DB_USER")
    );

    public void submit(Order order) {
        validate(order);
        repo.save(order);
    }
}

// Transparent dependency: the repository arrives from outside.
// A test passes an in-memory stand-in; production wires the real thing.
// Nothing about OrderService's logic changes; only where it gets its
// collaborator from does.
public class OrderService {
    private final OrderRepository repo;

    public OrderService(OrderRepository repo) {
        this.repo = repo;
    }

    public void submit(Order order) {
        validate(order);
        repo.save(order);
    }
}

Side-effect entanglement is the absence of a boundary between logic and observable output. The symptom is tests that send real notifications to test accounts, write to live queues, or require filesystem cleanup after every run. The fix is to separate the decision layer from the execution layer. A method that returns a description of what should happen (send notification X to user Y with message Z) rather than directly causing it to happen, places the observable change behind a boundary that a test can verify by checking delegation rather than side-effect state. The execution layer is then thin enough that a shallow integration test covers it adequately.

Non-deterministic seam is the introduction of uncontrolled external state into core logic paths. The symptom is a flaky test whose failure rate correlates with time of day, environment load, or test execution order. The fix is to make the non-determinism injectable: a Clock interface rather than LocalDateTime.now(), a RandomSource parameter rather than a static Math.random(), a FeatureContext argument rather than a flag service invoked inline. None of these patterns are novel. They are standard practice in teams that have hit the flakiness problem once and decided not to hit it a second time.

The testability spectrum

These four properties combine into a spectrum. Most production code lives somewhere in the middle rather than at either extreme. The practical value of naming the tiers is that they make the refactoring conversation concrete: not “we need better testing” but “this service is in tier two for dependency transparency, and moving it to tier three requires these three constructor changes over the next sprint.”

Property	Structurally Untestable	Incidentally Testable	Seam-Aware	Design-for-Testability
Dependency transparency	Constructed internally or resolved via global lookup	Mixed: some injected, some hidden	All dependencies declared; most injectable	All dependencies injected through constructor or parameters; interfaces over concretions
Side-effect isolation	Business logic and side effects interleaved throughout	Side effects partly separated; some logic remains coupled	Clear boundary between decision and execution layers	Execution layer is thin, swappable, and verifiable independently from business logic
Deterministic state	Time, randomness, and feature flags read inline from global sources	Non-determinism isolated to some modules; others remain coupled	Non-determinism injectable in most paths	All external state arrives as parameters; core logic is deterministic across all runs
Seam availability	No seams; changing behavior requires editing source	Some interfaces exist but are not used consistently	Seams available at major architectural boundaries	Seams designed in at every boundary; behavior can be varied for any testable path
Test cost signal	Tests require full production infrastructure, or cannot be written	Infrastructure tests coexist with unit tests; setup is heavy and brittle	Unit tests are cheap; integration tests are deliberate and scoped	Unit tests are fast, cheap, and stable; the integration boundary is thin and explicit

Moving up one tier in any single property reduces test maintenance cost for that component. The tiers are not equally expensive to traverse. Moving from Structurally Untestable to Incidentally Testable usually requires identifying and exposing the most painful hidden dependencies: a small number of targeted changes with immediate payoff. Moving from Seam-Aware to Design-for-Testability is a higher investment with more diffuse returns, and in most codebases that tier is the right target for new services, not a mandatory destination for every service already in production.

Trade-offs: when to extract seams and when not to

Testability refactoring is not always the right investment, and the argument breaks down if it becomes a blanket prescription.

Extract seams and invest in testability refactoring when: the test setup for a module costs more than two developer-days per sprint in aggregate, or test flakiness in that module is causing CI to rerun more than once per day on average. In both cases, the interest payment on the testability debt exceeds the cost of addressing it, and the refactoring pays back within a quarter.

Accept coupling and scope the test differently when: the component is thin, stable, and changes infrequently; the real integration test is cheaper to write than the seam extraction; and the blast radius of a defect in that component is bounded enough that an end-to-end test provides adequate coverage. Not every module needs a seam. A twelve-line utility function that reads a config file and returns a parsed struct has no testability problem that requires an interface boundary.

The error that produces the most wasted effort is applying the same testability investment uniformly across a codebase, which dilutes the high-value extractions across low-value components that do not need them. Testability refactoring earns its place as a budgeted activity when it targets the modules where the test penalty is highest.

Starting in a production codebase

The objection that follows most architecture discussions about testability is predictable: these decisions were made two years ago, the blast radius of changing them is too large, and the team does not have the headcount to revisit them. The objection is correct about one thing: you cannot rearchitect a mature service’s dependency graph in a single sprint. It is wrong about what follows from that.

The approach that produces measurable returns in production codebases is to prioritize by test penalty, not by architectural aspiration. The modules where tests are most expensive to write or most expensive to maintain are the modules generating the largest ongoing interest payment on testability debt. They are also the modules where a targeted refactoring produces the most immediate return, because the cost it eliminates is concrete and recurring.

The sequence: identify the components with the highest ongoing test penalty, measured in developer time and build reliability. Within each, identify which tier-one failure is most responsible using the spectrum table. Apply the targeted fix (dependency extraction, side- effect separation, injecting the clock or random source) without trying to reach the full Design-for-Testability profile in a single pass. Track the test cost before and after. The feedback loop is short enough to produce a usable signal within two or three sprints, which is the evidence needed to justify prioritizing the next module.

The global software testing market is estimated to reach $57.73 billion in 2026 (TestGrid, 2026 Software Testing Statistics), reflecting how much engineering investment goes into verifying software after it has been designed. Some of that spending is unavoidable. A material fraction is paying for the privilege of testing code whose design made testing expensive. The ThoughtWorks Radar’s return-to-fundamentals signal is specifically about this: not that testing tooling needs to improve, but that design discipline needs to precede it.

Across the teams we work with on architecture reviews and engineering quality programs, the most consistent finding is not that they are under-testing. It is that they are paying more than they should to test, because testability was never a design criterion. Test coverage is a trailing indicator of software quality. Testability is a leading indicator of how much that coverage will cost to produce and maintain.

If your test suite has become the most expensive maintenance surface in your codebase, or if you are planning a modernization and want to make testability a design criterion from the start rather than discovering its absence twelve months in, we have helped engineering teams run this analysis, locate the high-penalty modules, and make the structural changes that reduce that cost. We are glad to compare notes.

Testability spectrum self-assessment

Dependency transparency: do your components declare all collaborators at construction, or acquire them internally? Hidden acquisition is the most common source of expensive test setup.
Side-effect isolation: is there a clear boundary between the code that decides what should happen and the code that causes it? Entangled side effects make logic tests trigger real infrastructure.
Deterministic state: does core logic read the current time, a random seed, or feature flags inline? Push non-determinism to the edges and make it a parameter.
Seam availability: can a test change a component's behavior without editing source? If not, the component has no seam and cannot be tested in true isolation.
Invest in testability refactoring when test setup exceeds two developer-days per sprint, or when flakiness forces CI reruns daily. Accept coupling when the component is stable, thin, and the integration test is cheaper than the seam.
Prioritize by test penalty, not by architectural aspiration: the module with the most expensive or most brittle test suite is the one where testability debt is generating the highest interest payment.