"I Had a Dream" and Generative AI Jailbreaks

A message from ChatGPT followed by a piece of malicious code and a brief remark not to use it for illegal purposes. Initially published by Moonlock Lab, the screenshots of ChatGPT writing code for a keylogger malware is yet another example of trivial ways to hack large language models and exploit them against their policy of use.

The case with the dream is just one of many jailbreaks actively used to bypass content filters of the generative AI. Even though each LLM introduces moderation tools that limit their misuse, carefully crafted reprompts can help hack the model not with strings of code but with the power of words.

Posted from: Click here for the full article.

留言

此網誌的熱門文章

Microsoft Expands Free Logging Capabilities for all U.S. Federal Agencies

Zimbra Warns of Critical Zero-Day Flaw in Email Software Amid Active Exploitation

New Chrome Vulnerability Enables Cross-Origin Data Leak via Loader Referrer Policy