← Library
splApache-2.0from splunk/security_content

M365 Copilot Jailbreak Attempts

Detects M365 Copilot jailbreak attempts through prompt injection techniques including rule manipulation, system bypass commands, and AI impersonation requests that attempt to circumvent built-in safety controls. The detection searches exported eDiscovery prompt logs for jailbreak keywords like "pretend you are," "act as," "rules=," "ignore," "bypass," and "override" in the Subject_Title field, assigning severity scores based on the manipulation type (score of 4 for amoral impersonation or explicit rule injection, score of 3 for entity roleplay or bypass commands). Prompts with a jailbreak score of 2 or higher are flagged, prioritizing the most severe attempts to override AI safety mechanisms through direct instruction injection or unauthorized persona adoption.

Quality
67
FP risk
Forks
0
Views
0
Rule source🔒 locked
🔒

Sign in to view the rule source

Free accounts can view the source for the top-ranked rules. Create one in seconds — no credit card required.

Sign in →