Episode 45 — AI Threats: Model Manipulation, Poisoning, and Prompt Injection

In this episode, we look at Artificial Intelligence (A I) threats that affect how systems produce answers, make suggestions, sort information, and support decisions. A I can feel very different from traditional software because it may respond in flexible language instead of predictable menus and buttons. That flexibility is useful, but it also creates new ways for attackers to influence the system. An attacker may try to manipulate the model’s behavior, poison the information it learns from or retrieves, or hide instructions inside content the system processes. You do not need to become an A I engineer to understand the security risk. At the Security Plus level, your focus is on the basic pattern. A system accepts information, uses that information to produce an output, and someone may try to shape the input so the output becomes unsafe, misleading, unauthorized, or harmful.

Before we continue, a quick note. This audio course is part of our companion study series. The first book is a detailed study guide that explains the exam and helps you prepare for it with confidence. The second is a Kindle-only eBook with one thousand flashcards you can use on your mobile device or Kindle for quick review. You can find both at Cyber Author dot me in the Bare Metal Study Guides series.

A traditional application usually follows rules that developers wrote directly. If you click a button, submit a form, or request a page, the application follows a defined path. A I systems may still have rules, but they also rely on models, prompts, training data, reference data, and context. That means the system may be influenced by information that does not look like a command in the normal sense. A customer message, uploaded document, support ticket, web page, chat history, or knowledge base article may become part of what the A I uses to respond. This is why the trust boundary can feel less obvious. With a normal form field, you may ask whether the input is valid. With A I, you also ask whether the input is trying to steer the system, change its priorities, expose information, or misuse connected tools.

Model manipulation is a broad term for attempts to influence how an A I model behaves. The attacker may want the model to ignore safety instructions, provide restricted information, produce biased output, reveal sensitive data, or make a bad recommendation. Sometimes the manipulation happens through direct interaction, such as a user typing carefully crafted instructions into a chatbot. Sometimes it happens indirectly, through content the A I reads and summarizes. The danger is that the model may treat attacker-controlled text as meaningful context. A model that helps with customer support, document review, code analysis, or security triage may be exposed to untrusted material all day. If the model cannot separate trusted instructions from untrusted content, it may follow the wrong guidance. The attacker’s goal is not always to break the system visibly. Sometimes the goal is to quietly make the system less reliable.

Model manipulation can be especially risky when A I output affects decisions. A harmless-looking answer may lead someone to approve access, ignore an alert, trust a fake document, or choose the wrong response to an incident. If the model is used only for drafting low-risk text, the harm may be limited. If the model is connected to ticket routing, fraud review, hiring support, customer identity checks, or security analysis, the stakes become higher. An attacker may try to make malicious activity look normal, make a dangerous file seem safe, or make a fake request appear urgent and legitimate. You should think about A I output as advice that needs context, not as automatic truth. The more authority the system has, the more important it is to control what information can influence it and how its outputs are reviewed.

Data poisoning happens when an attacker corrupts the data used to train, tune, test, or guide an A I system. The poisoned data may teach the model the wrong pattern, hide a dangerous pattern, or cause a specific bad response under certain conditions. Training data poisoning can happen when a model learns from a dataset that includes attacker-influenced examples. Reference data poisoning can happen when an A I tool uses a knowledge base, document repository, web content, or support articles that contain misleading or malicious information. The attacker may not need to touch the model itself. If the model relies on a trusted source that has been polluted, the output can still become unsafe. The risk is simple to picture. If you feed bad information into a system that learns from information, you may get bad decisions back.

Poisoning can be quiet because the poisoned material may look ordinary. Imagine an internal knowledge base article that has been changed to include a false support process. An A I assistant that answers employee questions from that knowledge base may repeat the false process with confidence. Imagine a product review system where fake reviews train a model to classify scam listings as trustworthy. Imagine a security tool that uses mislabeled examples and begins treating a certain malicious pattern as normal. In each case, the attacker is not only attacking the final answer. The attacker is attacking the information environment around the answer. Indicators may include sudden changes in model behavior, repeated wrong answers tied to certain topics, unusual edits to reference documents, unexpected data sources, or outputs that cite or reflect untrusted material. The control begins with protecting the data pipeline.

Prompt injection is one of the most important A I attack ideas to understand. A prompt is the instruction or context given to the model so it knows what to do. In a direct prompt injection attack, the user writes instructions that try to override the system’s intended behavior. The attacker might tell the model to ignore previous instructions, reveal hidden guidance, bypass restrictions, or produce information it should not provide. In an indirect prompt injection attack, the attacker hides instructions inside content the A I later reads. That content might be a web page, email, document, ticket, transcript, or data field. The A I may be asked to summarize the content, but the hidden instruction inside the content may tell it to do something else. The danger is that the model may follow instructions from the wrong source.

Indirect prompt injection is especially important because the person using the A I may never see the attack. You might ask an A I assistant to summarize a web page, and the web page contains hidden text telling the assistant to reveal private notes or change its answer. You might ask an assistant to process an email, and the email contains instructions telling the assistant to mark the message as safe or forward information somewhere else. You might upload a document for review, and the document includes buried instructions that try to control the review. The user thinks the A I is reading content. The attacker hopes the A I treats malicious content as a command. This is different from ordinary phishing, but the trust problem feels familiar. The system is being tricked into trusting a message that came from an untrusted source.

The risk grows when an A I system is connected to tools, accounts, files, calendars, code, ticketing systems, or databases. A standalone chatbot that only answers general questions has limited ability to cause direct harm. An A I assistant that can read email, search internal documents, create tickets, run workflows, or update records has much more power. Prompt injection against that kind of system may try to make it misuse those connections. The attacker may want the A I to retrieve sensitive data, send a message, change a setting, or make a recommendation that causes a person to do those things manually. This is why A I permissions should be treated carefully. The model should not automatically have broad access just because the user does. Connected tools need clear limits, strong authorization, logging, and human review for higher-risk actions.

A useful control idea is separation of trust. Instructions from system designers, application owners, users, external documents, and untrusted content should not all have the same authority. A secure design should treat external content as data to be analyzed, not as instructions to be obeyed. That is easy to say and harder to guarantee, but the principle matters. The A I should be guided to summarize an email without following instructions inside the email that change its own behavior. It should review a web page without treating hidden web page text as higher priority than the system’s safety rules. It should answer from a knowledge base while still recognizing that the knowledge base itself needs access control and change management. The more clearly the system separates trusted instructions from untrusted content, the harder it is for an attacker to hijack the interaction.

Another control idea is limiting what the A I can access and do. This follows the same security principle you already know from identity and access management: least privilege. If an A I assistant only needs to summarize documents in one approved repository, it should not have broad access to every file, mailbox, or administrative function. If it can suggest an action, that does not mean it should be allowed to perform the action automatically. Higher-risk actions may require confirmation, approval, or a separate trusted workflow. The system should also avoid exposing hidden prompts, sensitive context, credentials, tokens, or private data in outputs. Access decisions should happen outside the model whenever possible, using normal security controls. The model can help interpret information, but it should not become the only gatekeeper for authorization.

Monitoring and testing also matter because A I behavior can change in ways that are harder to predict than traditional application behavior. Security teams may test whether prompts can bypass rules, whether poisoned documents influence answers, and whether the model reveals sensitive information when pressured. They may watch for unusual queries, repeated attempts to override instructions, strange tool-use patterns, or outputs that suddenly shift tone or content. Logging should capture enough information to support investigation while still protecting privacy and sensitive data. Feedback processes are also important because people using the system may be the first to notice that an answer seems wrong, unsafe, or manipulated. A I security is not a one-time setting. It requires ongoing review of the model, the data sources, the connected tools, and the way people actually use the system.

Good data governance reduces both poisoning and manipulation risk. The organization should know which data sources are used, who can change them, how changes are reviewed, and how old or unreliable content is removed. Trusted knowledge bases need ownership. Training and reference data need quality checks. Sensitive data should not be casually included in prompts, logs, or model context. If a system uses retrieval from internal documents, those documents need the same access control and classification discipline you would expect anywhere else. A I does not make messy data safe. In many cases, it makes messy data more influential because it can turn that data into confident language. When the source material is protected, reviewed, and limited to what the system truly needs, the model has fewer opportunities to repeat poisoned, outdated, or unauthorized information.

At this level, you can think of A I threats as attacks on influence. Model manipulation tries to influence behavior. Poisoning tries to influence the information the system learns from or relies on. Prompt injection tries to influence the instructions the system follows. The attacker may not be stealing a password or exploiting a buffer overflow. They may be shaping the words, examples, documents, or context that guide the system. The main controls are familiar even if the technology feels new. Use least privilege. Protect data sources. Separate trusted instructions from untrusted content. Validate and monitor outputs. Require human review for risky actions. Keep logs. Test the system against misuse. Treat A I as part of the security architecture, not as a magic layer above it. If you remember that A I systems are influenced by what they receive, you will be better prepared to recognize when someone is trying to influence them in the wrong direction.

Episode 45 — AI Threats: Model Manipulation, Poisoning, and Prompt Injection
Broadcast by