Anthropic Developing Constitutional Classifiers to Safeguard AI Models From Jailbreak Attempts

Posted on February 4, 2025 by staff

Anthropic announced the development of a new system on Monday that can protect artificial intelligence (AI) models from jailbreaking attempts. Dubbed Constitutional Classifiers, it is a safeguarding technique that can detect when a jailbreaking attempt is made at the input level and prevent the AI from generating a harmful response as a result of it.

Related Posts

Amazon AGI SF Lab Focused on Developing New Capabilities of AI Agents Established

Gmail’s Q&A Feature Is Now Rolling Out on iOS: Here’s How to Use It

OnePlus Buds Pro 3 to Launch in India and Other Global Markets on August 20

Leave a Reply Cancel reply