Most startup AI incidents do not begin with a novel model exploit. They begin with a useful feature reaching production before anyone forced the launch review to answer a few hard questions about trust boundaries, permissions, and failure handling. The model may look impressive in demos. That does not mean the workflow is ready for production pressure.
That is the pattern Unit 42's research helps clarify. AI features tend to break in a small number of predictable ways: untrusted instructions get treated like authority, tools inherit more power than they should, retrieval pipelines feed the model unsafe context, outputs get reused without sanitization, and nobody can explain what happens when the system misbehaves. For a security engineer or AppSec lead, that is useful because it turns "AI risk" into a launch checklist.
If your team cannot answer these ten questions before release, it is not shipping an AI feature with confidence. It is shipping an experiment with production access.
The 10 questions, in plain English
A good AI feature security checklist does not ask whether the model is "secure" in the abstract. It asks where the system can be steered, what it can touch, what it can expose, and how quickly you can contain damage. These ten questions cover the failure modes that show up most often in first-generation LLM launches:
Where can untrusted instructions enter the workflow?
Map every place the model reads text or files from users, tickets, documents, web pages, transcripts, or retrieved content. If the team only thinks about the chat box, it has not mapped the real prompt injection surface area.
What wins when the system prompt, retrieved content, tool output, and user input disagree?
Production failures often start when teams assume the model will reliably honor the highest-priority instruction. You need a concrete trust model for competing context, not a vague belief that the system prompt always wins.
Which tools can the model call, and what is the minimum permission each one truly needs?
An AI feature with broad Slack, CRM, ticketing, or admin permissions turns a prompt injection into an operations incident. Scope tools to the narrowest read and write paths that still support the product promise.
How do you know the retrieval pipeline is clean, current, and tenant-safe?
Retrieval pipeline integrity matters as much as model choice. If documents can be poisoned, stale, mis-tagged, or cross-tenant, the model can be manipulated with content your application supplied to it.
What sensitive data could the model reveal if an attacker successfully redirects it?
Security review should assume the assistant will eventually be pressured into oversharing. Enumerate what secrets, internal notes, PII, credentials, or customer records are reachable through the workflow before launch, not after a bad transcript exists.
Are outputs sanitized before they are rendered, executed, or passed to downstream systems?
LLM output is not safe just because it came from your model. If the response flows into HTML, Markdown, SQL, shell commands, templates, email, or automation rules, output sanitization and strict validation become production controls, not cleanup work.
Which actions still require deterministic checks or human approval?
Do not let the model be the last word on account changes, refunds, security actions, or external communications. High-impact actions need rule-based gates, allowlists, or human review even when the model drafts the recommendation.
What logs let you reconstruct a bad answer, tool call, or data disclosure?
You need enough telemetry to answer what the model saw, what it was told, what it retrieved, what tool it called, and what it returned. Without that, incident response turns into guesswork and buyer explanations become weak immediately.
How fast can you disable the feature, revoke a tool, or rotate credentials if behavior goes sideways?
Incident response readiness is part of launch readiness. If the only kill switch is a code deploy or a stressed engineer editing production config by hand, your containment plan is not real.
Who owns remediation, communication, and relaunch criteria after an AI security incident?
First-time AI launches often fail organizationally, not just technically. Product, security, support, and legal should already know who declares an incident, who communicates with buyers, and what evidence is required before the feature ships again.
A realistic launch scenario
Imagine a startup shipping its first AI support copilot. The assistant reads inbound tickets, retrieves knowledge-base articles, summarizes account history, drafts replies, and can open internal tickets when a case looks urgent. The team tests for quality, connects the model to the right APIs, and gets strong feedback internally. Launch feels close.
Then a customer submits a ticket with a pasted log bundle that includes hidden instructions telling the assistant to ignore prior guidance, pull anything relevant from internal notes, and draft a response that exposes the exact admin steps for fixing the issue. At the same time, one of the retrieved knowledge-base pages contains stale guidance from another tenant environment, and the tool that creates follow-up tasks still has permission to post into a shared incident Slack channel.
No single component looks catastrophic on its own. The failure comes from the chain: untrusted content enters the prompt, retrieval adds bad context, the model over-trusts what it read, and a tool with broad permissions acts on the result. That is why launch review has to cover the workflow, not just the model endpoint.
Why buyers and security leaders care
Buyers care because this is where LLM production security stops being a technical curiosity and becomes a procurement question. Enterprise prospects increasingly ask how AI features handle prompt injection, what logs exist for investigation, whether outputs are filtered before action, and how customer data is kept out of the wrong context. If your team cannot answer those questions crisply, the AI feature security checklist effectively gets written by the buyer during due diligence instead of by your team before launch.
Security leaders care because the downside is larger than a strange chatbot answer. A redirected assistant can leak customer data, trigger actions in internal systems, create false records, or force support and engineering teams into emergency containment work. Weak answers also show up in SOC 2 narratives, vendor reviews, cyber-insurance conversations, and incident-retrospective questions from leadership.
The business lesson is simple: production AI risk is rarely just a model issue. It is a trust, data-flow, and operational-readiness issue that becomes visible the moment a customer asks how the feature fails.
Why scanners fail to answer these questions
Traditional scanners can tell you whether a package is outdated, a header is missing, or a public endpoint exposes a known class of bug. They cannot tell you whether a support ticket can overwrite model intent, whether retrieved content is poisoning the answer, or whether a downstream tool should have been reachable by the assistant at all.
That gap exists because AI failures are behavioral and contextual. The exploit path lives across prompt hierarchy, retrieval ranking, tool invocation logic, output handling, and human review steps. A clean scanner result can coexist with an AI workflow that is one malicious document away from unsafe behavior.
This is why first-time AI launches often produce false confidence. The infrastructure may be tidy, but the workflow has not been adversarially tested as a system. You need evidence about trust boundaries and attack chains, not just infrastructure hygiene.
How Ciphvex helps
Ciphvex approaches an AI launch as an audit of the real production workflow. We map where untrusted inputs enter, what context the model receives, how retrieval is assembled, which tools can be called, where outputs flow next, and what telemetry exists when something goes wrong.
That means we do not stop at "prompt injection exists." We test the specific surfaces that matter to your feature: permission scoping for tools, retrieval pipeline integrity, output sanitization before execution or rendering, and the operational kill switches and incident runbooks that determine whether a bad model decision becomes a real security event.
The output is useful to a startup security team because it is practical and audit-ready. Instead of a vague scanner score, you get a written view of exploit paths, control gaps, and the highest-leverage fixes before launch pressure forces the organization to accept avoidable risk.
Request a free mini-scan before your first AI launch gets tested by production traffic instead of by your own team.
If you are shipping an LLM feature for the first time, Ciphvex can pressure-test the prompt injection surface, tool permissions, retrieval integrity, output handling, and incident readiness in a fast free mini-scan before launch.