The QBR opens with outcomes. Revenue uplift is flat. Retention is noisy. Conversion moved in one region and stalled in another. Someone asks the question the dashboard implies it can answer.
Did it work?
That question usually triggers a familiar loop. Central points to the business case and the designed flow. Field teams point to staffing gaps, diary pressure, and market differences. Dealers insist they ran it, then quietly describe the parts they had to compress to survive the week. Everyone leaves with a story, and no one leaves with a diagnosis.
Outcome KPIs arrive too late to explain what happened in a distributed rollout. By the time the curve moves, execution has already drifted. When the curve does not move, you still cannot tell whether the concept underperformed or whether the network never enacted it in a comparable way.
Outcomes do not explain a rollout that fragmented in week one
Network programs rarely fail with a clean stop. They become uneven. One dealer starts immediately, another delays until capacity frees up, another activates a local version that feels “close enough.” The program stays officially live while the operating model splits.
If you measure performance without measuring enactment, you end up interpreting results through belief. Central leaders argue from intent. Dealers argue from constraints. The dashboard becomes a referee that cannot see the match.
Measure enactment, or accept that performance debates will stay political
Either you can prove enactment, or you end up arguing outcomes through belief. Most networks sit in an unhelpful middle ground: they measure exposure and call it execution.
Training completion, webinar attendance, regional sign-off, toolkit distribution, portal downloads. These signals matter, but they mostly tell you that the network was told.
Execution looks different in a dealership. The work is visible when it has an owner, a place in the daily cadence, and a trace in the system. A program has converted into work when the CRM workflow carries the steps, the morning huddle assigns the follow-up, and someone checks the routine with the same seriousness as pipeline health and workshop load.
That is the threshold you need to verify before you interpret outcomes.
The signals you need come from where drift actually begins
These are not “extra” measures. They are how you keep performance evaluation meaningful in a network.
Early drift usually starts in a small set of places. The activation window stretches because capacity is tight and no one wants to commit. Follow-up discipline collapses into a burst of activity and then disappears when the inbox spikes. Targeting rules get adjusted because the audience feels inconvenient or the diary is already full. Ownership moves between roles until it becomes nobody’s job.
Activation discipline is a good example. It is hard to compare results across the network when half the dealers started on time, a quarter started late, and the rest started with modifications. Making activation visible does not need bureaucracy. It needs a simple, comparable view of when the routine actually began and whether it began in the intended form.
Follow-up discipline is another. Many programs depend on consistent follow-up, but follow-up is exactly what degrades under pressure. If you only look at end results, you discover the degradation when it is already normalised. When you look at whether the follow-up cadence is being sustained in a real week, you can distinguish a design issue from an operating constraint that needs local reinforcement.
Targeting integrity is the third. Weak results are often blamed on “bad leads” or “wrong audience.” Sometimes that is true. Often the selection rules were quietly bent to fit convenience, perceived quality, or diary capacity.
You can see this in the field. A service manager looks at Thursday’s diary, sees it’s already full, and tells the team to aim the campaign at “easy wins” instead of the intended segment. The numbers will still look like activity, but you will no longer be evaluating the program you designed.
When you track whether the program is being aimed at the audience you designed for, you protect the meaning of your evaluation.
Process ownership is the fourth, and it can stay simple. A routine survives when the trigger is clear, the owner is clear, and the exception path is agreed for when capacity is full or the customer goes silent. When those are unclear, responsibility bounces and the process degrades quietly.
Networks that do this well look boring in the best way
Mature networks do not win by measuring everything. They win by measuring a small set of enactment signals early, then using those signals to correct drift before the program becomes “variable by design.”
Their best performance conversations start with a simple discipline: prove execution first, then interpret outcomes. That shifts governance away from blame and toward operational support. It also protects central credibility, because central teams stop asking dealers to defend results without being able to show what was actually run.
A diagnostic reflection for your next review
In your last performance review, could you show that the program was enacted as designed before you judged whether it worked?
If outcomes were flat, could you show whether dealers started in the intended window and sustained follow-up discipline under pressure?
Could you also show that targeting rules stayed intact and ownership stayed clear enough for the routine to survive a real week?
If you can’t answer those questions cleanly, the next debate about performance will also be a debate about beliefs. The only way out is to measure execution early enough that you can see drift while it is still fixable.