Debug
Most DV debug content shows you tool features. This page is about debug engineering — the logging discipline, correlation IDs, scientific method, DUT instrumentation, triage automation, and replay techniques that make modern debug fast, reproducible, and ready for AI assistance.
It is a deliberate fusion of two traditions: the SystemVerilog / UVM toolkit that DV engineers already use, and the systematic debugging literature from software engineering — Andreas Zeller's scientific debugging, David Agans's 9 indispensable rules, Michael Feathers's work on legacy code, and the SRE observability canon. The premise is simple: a testbench whose logs are structured is a testbench whose failures are queryable — and a queryable testbench is one that scripts can triage, dashboards can summarize, and LLMs can reason about.
Foundations — Structured Debug Data
Logs, metrics, and traces already live inside your UVM testbench — just unstructured. These four cards convert each of them into a queryable surface.
The Scientific Method of Debug
Zeller's Why Programs Fail reframes debugging as science: hypothesis, experiment, observation, refine. Agans's Rule 2 makes it actionable — debug starts when you can reproduce the failure deterministically.
Correctness — Self-Asserting Testbenches
Push correctness into the testbench itself: contracts on every interface, automated shrinking on every failure, golden references for every checker.
DUT Instrumentation — White-Box Observability
The DUT is not a black box once you have bind. These four cards cover non-intrusive harnessing, hang detection, and the X-state discipline that has no SWE analog.
Triage at Scale — Noise to Signal
When the regression has 10,000 fails, the first task is not debug — it is grouping. Three complementary algorithms collapse the noise.
Replay & Resilience
Simulation is already deterministic — treat it that way. Then deliberately inject the chaos you would otherwise wait for production silicon to surface.
Process & GenAI Payoff
Once the foundations are in place, AI becomes the cheapest leverage on top of everything else — not a magic shortcut to skip the foundations.
Agans's 9 Indispensable Rules — Applied to DV
- Understand the System. Read the spec before grep'ing the log. The waveform is not a substitute for the protocol document.
- Make It Fail. Capture seed + plusargs + RTL hash + tool version into a one-line replay artifact. If you can't reproduce it, you can't debug it.
- Quit Thinking and Look. Open the waveform. The signal value is truth; your mental model is a guess.
- Divide and Conquer. Bisect commits, bisect transactions, bisect time. Binary search applies at every layer.
- Change One Thing at a Time. Never mutate testbench, RTL, and seed in the same debug iteration. Confounded variables hide the cause.
- Keep an Audit Trail. Every regression has structured logs. Every escape has a postmortem. Every bug links to its closing commit.
- Check the Plug. Verify the clock, reset, plusargs, license server, and tool version before debugging logic. The trivial explanation is usually correct.
- Get a Fresh View. Explain the bug to a colleague — or an LLM. Articulation finds half of bugs without the listener saying a word.
- If You Didn't Fix It, It Ain't Fixed. Add a coverage point and a regression test for every closed bug. Otherwise the same bug returns wearing a different hat.
Start Here — A 3-Step Adoption Path
- Add a JSON sidecar log. Extend
uvm_report_serverto emit a JSONL file alongside your human log. No test-code changes required. - Stamp every transaction with a UUID. Generate it in
pre_randomize()and carry it through driver, monitor, and scoreboard log lines. - Pipe one failing log to an LLM. Use the bundle pattern — JSON slice (last ~200 events) + relevant RTL snippet + failing assertion. See what it catches before investing further.
Further Reading
- Andreas Zeller — Why Programs Fail: A Guide to Systematic Debugging
- Andreas Zeller — The Debugging Book (debuggingbook.org)
- David J. Agans — Debugging: The 9 Indispensable Rules
- Michael Feathers — Working Effectively with Legacy Code
- Diomidis Spinellis — Effective Debugging: 66 Specific Ways
- Google SRE — Site Reliability Engineering (postmortem culture chapter)
- OpenTelemetry Specification — semantic conventions and context propagation