HalfCheetah PPO — Standard Training (Before)

Deployment Robustness
FAIL
Return 2% (jitter / delay / spike)
Stress Robustness
FAIL
Return -12% (5x speed)
Deployment Fragile
Agent degrades under deployment timing conditions. Recommended fix: train with speed randomization (jitter/delay/spike augmentation).

1. Robustness Test — Timing Perturbations

Wraps the environment with timing perturbations. The agent runs normally — no internal intervention. Deployment scenarios (jitter, delay, spike) model realistic conditions. Stress scenarios (5x speed) test extreme resilience.

Robustness Under Timing Perturbations

Robustness Bars

Robustness Detail

CategoryScenarioReturn (% nominal)95% CIRMSE ratioReturn Change
DeploymentSpeed jitter (2 +/- 1)28%25%–32% ***2.37x+71.9%
DeploymentObservation delay (1 step)2%0%–4% ***3.92x+98.1%
DeploymentMid-episode spike (1-5-1)100%91%–113%1.08x+0.2%
Stress5x Speed (unseen frequency)-12%-14%–-10% ***4.90x+111.9%

Recommendation

Agent degrades under deployment timing conditions. Recommended fix: train with speed randomization (jitter/delay/spike augmentation).

Speeds tested: [1, 2, 3, 5, 8] | Episodes per condition: 30 | Intervention support: False

Generated by deltatau-audit v0.3.4