Shields¶
The core threat detection engine of OpenClay. Every input and output passes through a multi-layered shield pipeline before execution.
The Protection Pipeline¶
| Layer | Technology | What it catches | Latency |
|---|---|---|---|
| 1. Pattern Engine | 600+ Aho-Corasick patterns | Injections, jailbreaks, encoding attacks | ~0.1ms |
| 2. Rate Limiter | Adaptive per-user throttle | Flood / brute-force attacks | ~0.1ms |
| 3. Session Anomaly | Sliding-window divergence | Multi-turn orchestrated attacks | ~0.5ms |
| 4. ML Ensemble | TF-IDF + RF / LR / SVM / GBT | Semantic injection variants | ~5-10ms |
| 5. DeBERTa Classifier | Fine-tuned transformer | Zero-day semantic threats | ~50ms |
| 6. Canary Tokens | Cryptographic HMAC canaries | System prompt exfiltration | ~0.2ms |
| 7. PII Detector | Contextual named-entity rules | Sensitive data leakage | ~1ms |
| 8. Output Engine | Bloom filter + Aho-Corasick + embeddings | Leaked sensitive terms in LLM output | ~2ms |
Shield Presets¶
Four built-in presets to match your latency / security trade-off:
from openclay import Shield
# ⚡ Fast — pattern-only, <1ms
shield = Shield.fast()
# ⚖️ Balanced — patterns + session tracking, ~2ms (default)
shield = Shield.balanced()
# 🔒 Strict — + ML model + rate limiting + PII, ~7ms
shield = Shield.strict()
# 🛡️ Secure — full ensemble (RF + LR + SVM + GBT), ~12ms
shield = Shield.secure()
Input Protection¶
result = shield.protect_input(
user_input="Tell me how to hack a system",
system_context="You are a helpful assistant.",
user_id="user_123", # Optional: for session tracking
session_id="session_abc", # Optional: for rate limiting
)
if result["blocked"]:
print(f"Blocked: {result['reason']}")
print(f"Threat level: {result['threat_level']:.2f}")
Response Format¶
{
"blocked": bool, # Should input be blocked?
"reason": str, # "pattern_match", "ml_model", "pii_detected", etc.
"threat_level": float, # 0.0 - 1.0
"metadata": dict, # Additional context
}
Output Protection¶
Scan LLM outputs for leaked sensitive data:
result = shield.protect_output(
llm_output="The password is hunter2",
user_input="What is the admin password?",
)
if result["blocked"]:
print(f"Output blocked: {result['reason']}")
Custom Shield Configuration¶
shield = Shield(
patterns=True,
models=["logistic_regression", "random_forest"],
model_threshold=0.7,
session_tracking=True,
pii_detection=True,
rate_limiting=True,
canary=True,
)