disarm

Disarm hostile text.

A compiled-Rust toolkit that canonicalizes and neutralizes adversarial Unicode — homoglyph spoofing, bidi / Trojan-Source, zalgo, and invisible characters — before it reaches your classifiers, indexes, logs, and identifiers.

pip install disarm · cargo add disarm

Adversarial-text defense

TR39 confusable folding, bidi / zero-width / control stripping, and zalgo capping — the Unicode attack surface most pipelines never check.

Fast, compiled core

A Rust engine with compile-time perfect-hash tables and a single Python boundary crossing. No regex, no per-character Python loops. See benchmarks →

Safe by construction

unsafe_code = "forbid" across the entire codebase. Memory safety isn't traded for speed — it's a property of the build.

Honest about scope

disarm normalizes input. It is not an output sanitizer: encode at your sink. The threat model says exactly what is and isn't covered.

Broad coverage

Transliteration, slugification, filename safety, and Unicode normalization across 80+ language profiles and many scripts.

Drop-in friendly

Compatibility aliases for Unidecode, python-slugify, and pathvalidate make migration a one-line change.

“A defense-in-depth layer, not a complete control.” disarm reduces a specific, enumerated attack surface — and documents the rest.