Benign-to-Toxic Jailbreaking: Inducing Harmful Responses from Harmless Prompts
arXiv preprint, 2025
Optimization-based jailbreaking that induces safety misalignment from benign conditioning prompts.
arXiv preprint, 2025
Optimization-based jailbreaking that induces safety misalignment from benign conditioning prompts.