Skip to content

Shields

The core threat detection engine of OpenClay. Every input and output passes through a multi-layered shield pipeline before execution.


The Protection Pipeline

Layer Technology What it catches Latency
1. Pattern Engine 600+ Aho-Corasick patterns Injections, jailbreaks, encoding attacks ~0.1ms
2. Rate Limiter Adaptive per-user throttle Flood / brute-force attacks ~0.1ms
3. Session Anomaly Sliding-window divergence Multi-turn orchestrated attacks ~0.5ms
4. ML Ensemble TF-IDF + RF / LR / SVM / GBT Semantic injection variants ~5-10ms
5. DeBERTa Classifier Fine-tuned transformer Zero-day semantic threats ~50ms
6. Canary Tokens Cryptographic HMAC canaries System prompt exfiltration ~0.2ms
7. PII Detector Contextual named-entity rules Sensitive data leakage ~1ms
8. Output Engine Bloom filter + Aho-Corasick + embeddings Leaked sensitive terms in LLM output ~2ms

Shield Presets

Four built-in presets to match your latency / security trade-off:

from openclay import Shield

# ⚡ Fast — pattern-only, <1ms
shield = Shield.fast()

# ⚖️ Balanced — patterns + session tracking, ~2ms (default)
shield = Shield.balanced()

# 🔒 Strict — + ML model + rate limiting + PII, ~7ms
shield = Shield.strict()

# 🛡️ Secure — full ensemble (RF + LR + SVM + GBT), ~12ms
shield = Shield.secure()

Input Protection

result = shield.protect_input(
    user_input="Tell me how to hack a system",
    system_context="You are a helpful assistant.",
    user_id="user_123",         # Optional: for session tracking
    session_id="session_abc",   # Optional: for rate limiting
)

if result["blocked"]:
    print(f"Blocked: {result['reason']}")
    print(f"Threat level: {result['threat_level']:.2f}")

Response Format

{
    "blocked": bool,           # Should input be blocked?
    "reason": str,             # "pattern_match", "ml_model", "pii_detected", etc.
    "threat_level": float,     # 0.0 - 1.0
    "metadata": dict,          # Additional context
}

Output Protection

Scan LLM outputs for leaked sensitive data:

result = shield.protect_output(
    llm_output="The password is hunter2",
    user_input="What is the admin password?",
)

if result["blocked"]:
    print(f"Output blocked: {result['reason']}")

Custom Shield Configuration

shield = Shield(
    patterns=True,
    models=["logistic_regression", "random_forest"],
    model_threshold=0.7,
    session_tracking=True,
    pii_detection=True,
    rate_limiting=True,
    canary=True,
)

Async Support

from openclay import AsyncShield

shield = AsyncShield.strict()

result = await shield.protect_input(
    user_input="...",
    system_context="...",
)