Testing and independent V&V: trust, but verify

Every program I have run reaches a moment where the build team stands up and says the system is ready. They mean it. They have configured it, walked the demos, and watched the happy path run clean. And they are not wrong, exactly. The system does what they built it to do.

The trouble is that "what they built" and "what the business needs" are two different statements, and only one of them shows up in the demo. The gap between them is where go-live disasters live. Closing it is what testing is actually for, and why I will not take the build team's word alone for it. Trust, but verify.

Why the build team can't grade its own homework

This is not about dishonesty. The people who configured the system are good at their jobs and they want it to work. The problem is blind spots, and blind spots are structural, not moral.

When you configure something, you build a mental model of how it is supposed to behave. Then you test the paths in that model. You confirm the cases you had in mind and you miss the ones you never imagined, because you cannot test for a scenario you do not know exists. The integrator tests the system they designed. Nobody on that team is incentivized to go looking for the reasons it might fail.

That is the whole case for an independent line of sight. Not because the build team is careless, but because no team can see its own assumptions from the inside.

What independent V and V adds

Independent verification and validation is not a second QA team doing the same scripts a little more carefully. It is a different vantage point, and it has to be set up deliberately or it collapses back into the integrator's view of the world.

Three things make it independent in a way that matters:

A separate line of sight. The V and V function reports to the program, not to the integrator. Its job is to tell you the truth about quality, not to protect a delivery date or a statement of work.
Risk-based coverage. You cannot test everything, so you test what matters. That means concentrating on the processes that move money, touch employees, and carry regulatory or contractual weight, instead of spreading effort evenly across things that don't.
The authority to say "not ready." An assessment nobody is obligated to act on is theater. Independent V and V only earns its keep when it can hold up a hand before go-live and be believed.

That last point is the one that gets negotiated away when schedules tighten. Resist it. The value of an independent voice is precisely that it can deliver news the program would rather not hear.

The build team can tell you the system does what they built. Only independent testing tells you it does what the business actually needs. Those are not the same sentence, and the difference is where go-live failures come from.

Test like production, not like a demo

A demo is designed to succeed. Clean records, round numbers, a script that walks straight through the door marked "this works." Production is none of those things, and a test that looks like a demo proves almost nothing.

If you want testing that means something, it has to look like the real world on its worst day. That means production-like data at production-like volumes, not a tidy sample of thirty employees. It means the ugly scenarios the configuration team would rather not think about: retroactive changes, mid-period transfers, off-cycle corrections, the new hire who started, quit, and was rehired inside one pay period.

For payroll specifically, this is non-negotiable. Run parallels against the legacy system and reconcile gross-to-net, person by person, until every variance is explained. Not "close enough." Understood. A penny you cannot account for is a defect you have not found yet, and on payroll the cost of finding it after go-live is paid in trust you do not get back.

Entry and exit criteria with teeth

Most test plans fail quietly long before a single script runs, because nobody agreed on what "done" means. So you reach the end of a stage, somebody declares it complete, and the declaration is really just exhaustion wearing a status color.

Define the criteria before you start. What has to be true to enter system test, and what has to be true to exit it and move on. Write down the thresholds, defect counts by severity, the scenarios that must pass, the reconciliations that must balance, and put them in front of everyone while the schedule is still calm.

Then hold the line when it gets uncomfortable, because it will. The reason you set the bar early is so that the decision to lower it has to be made out loud, by name, rather than slipping through because the date is close. And keep one distinction sharp the whole way: "tested" is not "production-ready." Tested means somebody ran it. Production-ready means you would stake the business on the result. Only the second one earns a go.

Defect discipline tells you the truth

The status slide will always look fine. Green boxes, a healthy pass rate, a summary that says the program is on track. I have learned not to trust any of it without looking underneath, because the summary is built to reassure and the data is not.

So watch the defects directly. Triage every one by severity so the critical issues cannot hide behind a pile of cosmetic ones. Insist on fix-and-retest, not fix-and-assume, because a fix you did not re-run is a hope, not a result.

Most of all, watch the trend. Plot defect inflow against closure over time. If new defects keep arriving faster than the team closes them, you are not near done no matter what the pass rate says, you are still discovering how much you do not know. When inflow finally drops and closure catches up and stays caught up, the curve is telling you something the summary slide never will: the system is settling down. Believe the curve.