"I Had a Dream" and Generative AI Jailbreaks

A message from ChatGPT followed by a piece of malicious code and a brief remark not to use it for illegal purposes. Initially published by Moonlock Lab, the screenshots of ChatGPT writing code for a keylogger malware is yet another example of trivial ways to hack large language models and exploit them against their policy of use.

The case with the dream is just one of many jailbreaks actively used to bypass content filters of the generative AI. Even though each LLM introduces moderation tools that limit their misuse, carefully crafted reprompts can help hack the model not with strings of code but with the power of words.

Posted from: Click here for the full article.

留言

此網誌的熱門文章

Researchers Detail Kubernetes Vulnerability That Enables Windows Node Takeover

Critical Veeam Backup Enterprise Manager Flaw Allows Authentication Bypass

Turkish Hackers Exploiting Poorly Secured MS SQL Servers Across the Globe