AI Security: Protecting Systems From Those Who Would Exploit Them

AI systems are now becoming targets. Bad actors actively probe weaknesses, seeking to manipulate, deceive, and exploit AI systems for harmful ends. A model that behaves safely under normal conditions may be vulnerable to adversarial attacks that cause it to fail precisely the moments that matter most.

AI security is the field that addresses this threat landscape. It covers prompt injection, i.e. tricking a system into ignoring its instructions; data poisoning, i.e. corrupting training data to embed hidden behaviors; adversarial inputs, i.e. crafted to fool classifiers and perception systems; and model extraction, i.e. reverse-engineering proprietary systems through strategic queries. These are not hypothetical concerns. They are documented, recurring attack vectors that have been demonstrated across commercial, government, and research deployments.

The Lab’s security research assumes the presence of an intelligent adversary. This is a fundamentally different threat model from safety concerns with accidents and errors. Our work produces both technical defenses and operational guidance, equipping organizations to identify vulnerabilities before they are exploited and to respond effectively when they are.