Benchmark Performance Comparison
StackOne dominates with 87-97% F1 across ALL benchmarks (90.8% avg), while competitors are either wildly inconsistent (v1: 68%, DeBERTa: 122%) or consistently mediocre (v2: 63% avg)
Excellent on ALL benchmarks: 87-97% F1 (always 87%+)
F1 ranges from 0.55 to 0.92 - unpredictable in production
Fixed variance but now consistently mediocre (0.60-0.67)
Wild swings: 0.33-0.74 F1 (catastrophic on xxz224)
Lower false positive rate means fewer benign prompts incorrectly flagged as attacks. Critical for production usability.
140 false positives out of 847 benign prompts
423 false positives - flags half of benign prompts!
Smaller models enable edge deployment, mobile apps, and browser execution. StackOne is 48x smaller than Meta PG and 32x smaller than DeBERTa.
Runs on browsers, mobile devices, IoT - anywhere
1,064 MB vs 22 MB (Meta's actual size, not claimed 344 MB)
StackOne runs faster on standard CPU than competitors on expensive T4 GPU. No GPU required = lower infrastructure costs.
Real-time protection without expensive GPU instances
43ms on T4 GPU ($0.60/hour) vs 10ms on standard CPU
StackOne achieves enterprise-grade accuracy (87% F1) with edge-deployable size (22 MB). The sweet spot for production deployments.
BEST accuracy + edge-deployable size (48x smaller than competitors)
91% F1 vs DistilBERT's 86% - superior performance AND efficiency
All metrics at a glance
| Model | Avg F1 | Size | Latency | FP Rate | Hardware | Consistency |
|---|