Episode 47 — AI Abuse: Jailbreaking, Evasion, Privacy, Session Hijacking, and Code Execution
In this episode, we look at Artificial Intelligence (A I) abuse, which means the ways people can misuse A I-enabled systems after those systems are placed into real workflows. Some A I risks come from mistakes, weak data handling, or overconfidence. Abuse is different because someone is actively trying to push the system past its intended boundaries. They may try to jailbreak the tool, evade its safeguards, expose private information, hijack a session, or use connected features to run unsafe actions. These risks become more serious when A I is not just answering questions, but also connected to plugins, code environments, databases, documents, tickets, email, or user accounts. The more a system can see and do, the more carefully it must be controlled. Your goal here is to understand the basic attack patterns and why strong boundaries, logging, and review matter.
Before we continue, a quick note. This audio course is part of our companion study series. The first book is a detailed study guide that explains the exam and helps you prepare for it with confidence. The second is a Kindle-only eBook with one thousand flashcards you can use on your mobile device or Kindle for quick review. You can find both at Cyber Author dot me in the Bare Metal Study Guides series.
Jailbreaking is an attempt to make an A I system ignore the rules, limits, or safety instructions it is supposed to follow. The attacker may phrase the request as a game, a roleplay scenario, a fictional task, a test, or an emergency. The request may try to convince the system that normal rules no longer apply. In some cases, the attacker asks the system to reveal hidden instructions, provide restricted content, or produce output that the system was designed to refuse. The important point is not the exact wording of the jailbreak. The pattern is that the attacker is trying to talk the system out of its boundaries. With traditional software, an attacker may exploit a bug in code. With A I, the attacker may exploit how the model interprets language, context, priority, and instruction. The attack happens through conversation, but the result can still create real security risk.
Jailbreaking matters because A I systems are often wrapped in instructions that define acceptable behavior. Those instructions may tell the system not to reveal confidential information, not to provide dangerous guidance, not to misuse tools, and not to treat untrusted content as a command. If a jailbreak succeeds, the model may produce information that should have stayed protected or may take a step that should have required approval. A jailbreak can also be used as part of a larger attack. The attacker may first weaken the system’s behavior, then ask it to summarize restricted material, call a connected tool, or generate content that supports phishing or fraud. A strong control approach does not depend on the model politely refusing every bad request. Important protections should also exist outside the model, such as access control, tool permission limits, monitoring, and human approval for high-risk actions.
Evasion is related to jailbreaking, but the focus is on slipping past detection, filters, or policy checks. An attacker may avoid obvious words, break up sensitive terms, use indirect phrasing, encode content, switch languages, or hide meaning inside a longer request. They may ask for something harmless-looking that can be assembled into something harmful later. Evasion can also target systems that use A I to detect abuse, fraud, spam, malware, or policy violations. The attacker tries to change the shape of the content so the system fails to recognize it. In a security setting, evasion is familiar because attackers have always tried to avoid detection. A I adds a new layer because both the attack content and the defensive review may involve language models. You should expect attackers to test boundaries repeatedly and adjust their wording based on what the system allows or blocks.
Evasion becomes especially challenging when the A I system processes untrusted content from many places. A public comment, uploaded document, support message, web page, or email may be crafted to avoid obvious warning signs while still influencing the model. A user may ask the A I to summarize a document, and the document may contain hidden or indirect instructions. A moderation system may review a message, and the message may be written to appear harmless while carrying a harmful purpose. A fraud system may classify a transaction explanation, and the attacker may shape the language to look normal. These examples show why A I security cannot rely only on keyword blocking. Good defenses consider behavior, context, source, permissions, and outcomes. When a system repeatedly receives requests that probe its limits, even failed attempts can be useful signals for monitoring.
Privacy exposure is one of the most serious abuse risks because A I systems often handle large amounts of user, customer, employee, and business information. An attacker may try to make the system reveal personal data, internal documents, private messages, account details, source code, credentials, or security findings. Personally Identifiable Information (P I I) deserves special care because exposure can harm individuals and create legal or regulatory consequences. The attacker may ask direct questions, but they may also use indirect prompts that seem harmless. They might ask for a summary of all available records about a person, a list of recent customer issues, or examples from real support tickets. If access controls are weak, the A I may retrieve or generate sensitive information that the user should not be allowed to see. Privacy exposure often happens when the system’s helpfulness outruns its authorization checks.
Privacy risk also appears when A I output combines information from different sources. A single document may not reveal much by itself, but an A I tool may connect names, dates, locations, account details, and internal notes into a much more revealing answer. This is sometimes called inference risk, because the system helps infer something sensitive from pieces of data. A user who cannot access a full file might still ask broad questions and receive fragments that reveal what the file contains. A support assistant might accidentally include one customer’s details in another customer’s response. A coding assistant might repeat secrets that were accidentally stored in a repository. These risks are not solved by telling users to be careful. The system needs clear data boundaries, role-based access, careful retrieval controls, output filtering, retention limits, and logs that show what information was accessed.
Session hijacking happens when an attacker takes over or misuses an active user session. A session is the period after authentication when a system recognizes a user as signed in. In an A I-enabled system, that session may carry access to chats, documents, plugins, code tools, data stores, or business applications. If an attacker steals a session token, compromises a browser, tricks a user into approving access, or abuses a connected application, the attacker may act through the user’s trusted session. This can be more dangerous than a simple bad prompt because the system may believe the request comes from an authenticated user. The attacker may not need the password if the session is already valid. Indicators can include activity from unusual locations, unfamiliar devices, strange tool use, sudden data access, or actions that do not fit the user’s normal behavior.
Session hijacking also matters because A I assistants may remember context during a session. The system may have access to previous prompts, uploaded files, temporary data, or task history. If an attacker enters the session, they may see information that was never meant to leave that conversation. They may also ask the assistant to continue a task, retrieve related files, or use tools already authorized for that user. A compromised session can turn normal convenience into risk. Strong session controls help reduce that risk. Sessions should expire appropriately, sensitive actions should require renewed verification, and high-risk tool use should not rely only on the fact that the user signed in earlier. Monitoring should look for impossible travel, concurrent sessions, unusual device changes, and behavior that does not match the account owner. A trusted session should not mean unlimited trust forever.
Code execution risk appears when an A I system can write, suggest, modify, or run code. A coding assistant can be useful, but it can also create security problems if its output is trusted without review. It may generate vulnerable code, include unsafe dependencies, mishandle secrets, or suggest logic that fails under attack. The risk becomes more serious when the A I can execute code in an environment connected to files, networks, credentials, or production data. An attacker may try to trick the system into running commands, reading files, making network calls, or modifying data. Even if the A I did not intend harm, unsafe execution can still cause damage. Code execution requires strict separation between experiment and production, careful permission limits, clean temporary environments, and review before changes affect real systems or real data.
Plugins and connected tools expand the risk because they let an A I system move from words into actions. A plugin may allow the assistant to search files, send messages, create tickets, query databases, schedule meetings, analyze code, or update records. Those actions may be helpful in normal work, but each connection becomes a possible path for abuse. A prompt injection inside a document might try to make the assistant use a plugin in an unintended way. A hijacked session might let an attacker ask the assistant to pull data from a connected store. A jailbreak might try to bypass tool-use restrictions. The safest design treats tools as separate capabilities that require specific permission, not as automatic extensions of the model. The A I should only have access to the tools it needs, and sensitive actions should require clear confirmation or approval.
Data stores create another important boundary. An A I tool may be connected to internal documents, customer records, logs, tickets, source repositories, or knowledge bases. If the connection is too broad, the system may retrieve information beyond what the user should see. If the data store contains poisoned, outdated, or malicious content, the A I may repeat or act on bad information. If the tool logs prompts and outputs carelessly, sensitive data may be stored in places where it does not belong. Secure design starts by asking what data the A I truly needs, who is allowed to access that data, and how retrieval is controlled. The model should not become a shortcut around normal authorization. A user should not gain access to restricted content just because the answer arrives through a conversational interface instead of a file browser.
Strong boundaries are the main theme across jailbreaking, evasion, privacy exposure, session hijacking, and code execution. A boundary can separate trusted instructions from untrusted content, one user’s data from another user’s data, low-risk suggestions from high-risk actions, test code from production systems, and normal sessions from sensitive operations. Boundaries should be enforced by system design, not only by asking the model to behave. Least privilege is a key principle here. The A I should have only the access needed for its purpose. Tools should be scoped. Data retrieval should respect user permissions. Code execution should happen in controlled environments. Sensitive actions should require confirmation, and especially risky actions should require human review. When boundaries are clear, a single bad prompt is less likely to become a full compromise.
Monitoring gives defenders a way to see when those boundaries are being tested. Useful logs may show prompts, tool calls, data retrieval, session activity, access decisions, errors, and blocked actions. Monitoring should look for repeated jailbreak attempts, unusual language patterns, unexpected plugin use, large data pulls, access to unusual records, code execution attempts, and session behavior that does not match the user. Privacy still matters, so logging should be designed carefully and protected from unnecessary exposure. The point is not to watch people for curiosity. The point is to understand whether the system is being abused, whether controls are working, and whether sensitive data or actions are at risk. Monitoring also supports improvement. If attackers keep probing a certain weakness, defenders can adjust permissions, prompts, filters, workflows, and user guidance.
A I abuse is not a completely separate world from the security ideas you already know. It still comes back to identity, access, data protection, monitoring, secure design, and human judgment. Jailbreaking tries to push the model past its rules. Evasion tries to slip past filters and detection. Privacy exposure turns helpful answers into data leakage. Session hijacking abuses the trust attached to an authenticated user. Code execution turns generated instructions into actions that may affect real systems. Plugins, data stores, and connected user sessions make these risks more serious because the A I can reach beyond conversation. The safer mindset is to treat A I as a powerful interface that needs the same security discipline as any other sensitive system. Give it limited access, separate trusted from untrusted input, monitor how it is used, and require review where mistakes would matter.