Skip to main content

Anomaly Detection

BatAudit detects anomalies automatically using statistical methods — no external ML services, no model training, no cloud dependencies.

The anomaly engine runs inside the Worker process and evaluates every batch of incoming events.


Detectors

Volume Spike (Z-score)

Monitors the rate of incoming events per service and computes a z-score against a rolling baseline. If the z-score exceeds the threshold (default: 3.0), a volume_spike alert is generated.

This catches sudden traffic bursts — load attacks, runaway retry loops, or misconfigured clients.

Configuration:

ANOMALY_VOLUME_THRESHOLD=3.0       # z-score threshold
ANOMALY_VOLUME_WINDOW=300 # sliding window in seconds

Error Rate

Tracks the ratio of 4xx + 5xx responses within a sliding window. If the error rate exceeds the threshold (default: 20%), an error_rate alert is generated.

Useful for detecting deployment failures, downstream service degradation, or sudden spikes in bad requests.

Configuration:

ANOMALY_ERROR_RATE_THRESHOLD=20.0  # percentage
ANOMALY_ERROR_RATE_WINDOW=300 # sliding window in seconds

Brute Force

Detects when the same identifier produces repeated 401 responses in a short period (default: 10 failures in 5 minutes). Generates a brute_force alert.

Configuration:

ANOMALY_BRUTE_FORCE_THRESHOLD=10   # failure count
ANOMALY_BRUTE_FORCE_WINDOW=300 # window in seconds

Mass Delete

Triggers when a high number of DELETE requests are made within a short window (default: 50 in 5 minutes). Useful for detecting accidental bulk deletes or malicious data deletion.

Configuration:

ANOMALY_MASS_DELETE_THRESHOLD=50   # request count
ANOMALY_MASS_DELETE_WINDOW=300 # window in seconds

Silent Service

Triggers when a service that was active stops sending events for longer than the threshold (default: 15 minutes). Detects crashed services, broken deployments, or network partitions that prevent events from reaching BatAudit.

Configuration:

ANOMALY_SILENT_SERVICE_MINUTES=15  # silence threshold in minutes

Alert cooldown

To avoid alert storms, each rule has a cooldown period. A rule won't fire again for the same service until the cooldown expires.

ANOMALY_COOLDOWN=5m   # default cooldown between alerts for the same rule+service

The demo uses ANOMALY_COOLDOWN=1m so alerts are visible quickly.


Viewing alerts

Alerts appear in:

  • Dashboard → Anomalies page
  • Notifications (Web Push + Webhooks, if configured)
  • API: GET /v1/audit/anomalies
  • Events list: filter by event_type=system.alert