CartPole Speed-Randomized GRU (After Fix)

Deployment Robustness
DEGRADED
Return 62% (jitter / delay / spike)
Stress Robustness
FAIL
Return 49% (5x speed)
Deployment Fragile
Agent degrades under deployment timing conditions. Recommended fix: train with speed randomization (jitter/delay/spike augmentation).

1. Robustness Test — Timing Perturbations

Wraps the environment with timing perturbations. The agent runs normally — no internal intervention. Deployment scenarios (jitter, delay, spike) model realistic conditions. Stress scenarios (5x speed) test extreme resilience.

Robustness Under Timing Perturbations

Robustness Bars

Robustness Detail

CategoryScenarioReturn (% nominal)95% CIRMSE ratioReturn Change
DeploymentSpeed jitter (2 +/- 1)115%103%–129%1.04x-14.9%
DeploymentObservation delay (1 step)95%89%–102%1.03x+4.5%
DeploymentMid-episode spike (1-5-1)62%46%–79% ***1.33x+37.6%
Stress5x Speed (unseen frequency)49%33%–67% ***1.59x+51.0%

Recommendation

Agent degrades under deployment timing conditions. Recommended fix: train with speed randomization (jitter/delay/spike augmentation).

Speeds tested: [1, 2, 3, 5, 8] | Episodes per condition: 30 | Intervention support: False

Generated by deltatau-audit v0.3.5