StackOne Defender

Benchmark Performance Comparison

1. Consistency Across Benchmarks

StackOne dominates with 87-97% F1 across ALL benchmarks (90.8% avg), while competitors are either wildly inconsistent (v1: 68%, DeBERTa: 122%) or consistently mediocre (v2: 63% avg)

🏆
StackOne: 90.8% avg F1 (BEST OVERALL)

Excellent on ALL benchmarks: 87-97% F1 (always 87%+)

⚠️
Meta PG v1: 68% variance

F1 ranges from 0.55 to 0.92 - unpredictable in production

⚠️
Meta PG v2: 63% avg F1

Fixed variance but now consistently mediocre (0.60-0.67)

🔴
DeBERTa: 122% variance

Wild swings: 0.33-0.74 F1 (catastrophic on xxz224)

2. False Positive Rate

Lower false positive rate means fewer benign prompts incorrectly flagged as attacks. Critical for production usability.

StackOne: 16.5% FP rate

140 false positives out of 847 benign prompts

⚠️
Meta PG v1: 50% FP rate

423 false positives - flags half of benign prompts!

3. Model Size Comparison

Smaller models enable edge deployment, mobile apps, and browser execution. StackOne is 48x smaller than Meta PG and 32x smaller than DeBERTa.

📦
22 MB quantized ONNX

Runs on browsers, mobile devices, IoT - anywhere

🚀
48x smaller than Meta PG

1,064 MB vs 22 MB (Meta's actual size, not claimed 344 MB)

4. Latency: CPU vs GPU

StackOne runs faster on standard CPU than competitors on expensive T4 GPU. No GPU required = lower infrastructure costs.

~10ms on CPU

Real-time protection without expensive GPU instances

💰
4.3x faster than GPU competitors

43ms on T4 GPU ($0.60/hour) vs 10ms on standard CPU

5. Accuracy vs Size Trade-off

StackOne achieves enterprise-grade accuracy (87% F1) with edge-deployable size (22 MB). The sweet spot for production deployments.

🏆
Winner: 91% F1 @ 22 MB

BEST accuracy + edge-deployable size (48x smaller than competitors)

📊
Beats DistilBERT despite 81x smaller

91% F1 vs DistilBERT's 86% - superior performance AND efficiency

Performance Summary

All metrics at a glance

Model Avg F1 Size Latency FP Rate Hardware Consistency