/debug¶
Analyze a failing or underperforming experiment and generate a structured debug report.
Usage¶
What It Does¶
- Identifies the failure -- Reads recent experiment logs, error output, and stack traces to understand what went wrong.
- Analyzes root cause -- Examines the code changes, configuration, data pipeline, and environment to pinpoint the probable cause.
- Generates recommendations -- Produces a ranked list of suggested fixes with explanations.
- Saves a debug report -- Writes a structured Markdown file to
.researchpad/experiments/debug/.
Example¶
This might produce:
With contents:
# Debug Report: NaN Loss During Training
## Error
RuntimeError: Loss became NaN at epoch 12, batch 847.
## Analysis
The learning rate of 1e-2 is too high for the current batch size (16).
Gradient norms exceeded 100.0 starting at epoch 10, indicating
gradient explosion before the NaN occurred.
## Suggested Fixes
1. Reduce learning rate to 1e-3 or lower
2. Add gradient clipping with max_norm=1.0
3. Increase batch size to 32 to stabilize gradients
## Related Experiments
- exp-041: Last successful run with lr=1e-3
- exp-042: Failed run with lr=1e-2
Artifact Format¶
Each debug report is a Markdown file saved to:
.researchpad/experiments/debug/YYYY-MM-DD-description.md
Reports follow a structured format with sections for the error, analysis, suggested fixes, and related experiments. The Dashboard displays these in the Debug panel.
Run /debug right after a failure
The command works best when run immediately after a failed experiment, while the error context and recent code changes are still available.