Vortex security checks

Aug 31, 2025

Quick updates on Vortex:

In building Vortex, our approach is to treat prompts and responses as two distinct surfaces to monitor.

Input checks

Prompt injection - We’re building detection for adversarial prompts. This covers direct attacks (“Ignore previous instructions…”), indirect attacks hidden in documents, and the role-playing scenarios used for jailbreaking.
Sensitive data leakage - We’re building filters to detect a range of sensitive information in (near) real-time against a trusted source:
- Credentials like API_KEYS, etc;
- Proprietary source code;
- PII;
- Confidential information / file contents

Output checks

Harmful/offensive content - Catch toxic, illegal, or otherwise malicious content that might get past the model’s built-in safety layers.
Illegal activities, misinformation, and disinformation (might be tricky, this).
Sensitive data leakage

If you’re interested in checking this out or want to know more about Vortex, try it out here, or contact us here.

Best,
Chew

Learn more about Vortex