CartPole Baseline GRU (Before Fix)

Deployment Robustness
FAIL
Return 23% (jitter / delay / spike)
Stress Robustness
FAIL
Return 12% (5x speed)
Deployment Fragile
Agent degrades under deployment timing conditions. Recommended fix: train with speed randomization (jitter/delay/spike augmentation).

1. Robustness Test — Timing Perturbations

Wraps the environment with timing perturbations. The agent runs normally — no internal intervention. Deployment scenarios (jitter, delay, spike) model realistic conditions. Stress scenarios (5x speed) test extreme resilience.

Robustness Under Timing Perturbations

Robustness Bars

Robustness Detail

CategoryScenarioReturn (% nominal)95% CIRMSE ratioReturn Change
DeploymentSpeed jitter (2 +/- 1)66%52%–83% ***1.61x+33.9%
DeploymentObservation delay (1 step)82%72%–94% ***0.97x+17.8%
DeploymentMid-episode spike (1-5-1)23%18%–30% ***1.32x+77.1%
Stress5x Speed (unseen frequency)12%9%–16% ***1.78x+87.7%

Recommendation

Agent degrades under deployment timing conditions. Recommended fix: train with speed randomization (jitter/delay/spike augmentation).

Speeds tested: [1, 2, 3, 5, 8] | Episodes per condition: 30 | Intervention support: False

Generated by deltatau-audit v0.3.5