Debug

Most DV debug content shows you tool features. This page is about debug engineering — the logging discipline, correlation IDs, scientific method, DUT instrumentation, triage automation, and replay techniques that make modern debug fast, reproducible, and ready for AI assistance.

It is a deliberate fusion of two traditions: the SystemVerilog / UVM toolkit that DV engineers already use, and the systematic debugging literature from software engineering — Andreas Zeller's scientific debugging, David Agans's 9 indispensable rules, Michael Feathers's work on legacy code, and the SRE observability canon. The premise is simple: a testbench whose logs are structured is a testbench whose failures are queryable — and a queryable testbench is one that scripts can triage, dashboards can summarize, and LLMs can reason about.

Foundations — Structured Debug Data

Logs, metrics, and traces already live inside your UVM testbench — just unstructured. These four cards convert each of them into a queryable surface.

The Scientific Method of Debug

Zeller's Why Programs Fail reframes debugging as science: hypothesis, experiment, observation, refine. Agans's Rule 2 makes it actionable — debug starts when you can reproduce the failure deterministically.

Correctness — Self-Asserting Testbenches

Push correctness into the testbench itself: contracts on every interface, automated shrinking on every failure, golden references for every checker.

DUT Instrumentation — White-Box Observability

The DUT is not a black box once you have bind. These four cards cover non-intrusive harnessing, hang detection, and the X-state discipline that has no SWE analog.

Triage at Scale — Noise to Signal

When the regression has 10,000 fails, the first task is not debug — it is grouping. Three complementary algorithms collapse the noise.

Replay & Resilience

Simulation is already deterministic — treat it that way. Then deliberately inject the chaos you would otherwise wait for production silicon to surface.

Process & GenAI Payoff

Once the foundations are in place, AI becomes the cheapest leverage on top of everything else — not a magic shortcut to skip the foundations.

Agans's 9 Indispensable Rules — Applied to DV

  1. Understand the System. Read the spec before grep'ing the log. The waveform is not a substitute for the protocol document.
  2. Make It Fail. Capture seed + plusargs + RTL hash + tool version into a one-line replay artifact. If you can't reproduce it, you can't debug it.
  3. Quit Thinking and Look. Open the waveform. The signal value is truth; your mental model is a guess.
  4. Divide and Conquer. Bisect commits, bisect transactions, bisect time. Binary search applies at every layer.
  5. Change One Thing at a Time. Never mutate testbench, RTL, and seed in the same debug iteration. Confounded variables hide the cause.
  6. Keep an Audit Trail. Every regression has structured logs. Every escape has a postmortem. Every bug links to its closing commit.
  7. Check the Plug. Verify the clock, reset, plusargs, license server, and tool version before debugging logic. The trivial explanation is usually correct.
  8. Get a Fresh View. Explain the bug to a colleague — or an LLM. Articulation finds half of bugs without the listener saying a word.
  9. If You Didn't Fix It, It Ain't Fixed. Add a coverage point and a regression test for every closed bug. Otherwise the same bug returns wearing a different hat.

Start Here — A 3-Step Adoption Path

  1. Add a JSON sidecar log. Extend uvm_report_server to emit a JSONL file alongside your human log. No test-code changes required.
  2. Stamp every transaction with a UUID. Generate it in pre_randomize() and carry it through driver, monitor, and scoreboard log lines.
  3. Pipe one failing log to an LLM. Use the bundle pattern — JSON slice (last ~200 events) + relevant RTL snippet + failing assertion. See what it catches before investing further.

Further Reading

  • Andreas Zeller — Why Programs Fail: A Guide to Systematic Debugging
  • Andreas Zeller — The Debugging Book (debuggingbook.org)
  • David J. Agans — Debugging: The 9 Indispensable Rules
  • Michael Feathers — Working Effectively with Legacy Code
  • Diomidis Spinellis — Effective Debugging: 66 Specific Ways
  • Google SRE — Site Reliability Engineering (postmortem culture chapter)
  • OpenTelemetry Specification — semantic conventions and context propagation