Cybersecurity Research

Exploring the OWASP Top 10 vulnerabilities for Large Language Models.

The OWASP Top 10 for LLMs

This section provides a systematic overview of generative AI misuse scenarios, focusing on the critical security risks identified by OWASP. Each entry below provides a short technical summary of the vulnerability and its core remediations. For interested readers, the full detailed analyses can be found in the appendices.

LLM01

Prompt Injection

Description: Manipulating LLM behavior via crafted inputs to bypass safety filters or execute unauthorized actions (OWASP Foundation, 2025). This targets the model's reasoning process rather than a parser, allowing attackers to socially engineer the AI into unsafe behavior through direct (user input) or indirect instructions—where malicious commands are hidden in data sources like web pages or emails (Greshake et al., 2023).

Key Remediations:

Use strong, boundary-defining system prompts.
Enforce structured output formats (e.g., JSON schemas) and validate them.
Implement robust input/output filtering for suspicious patterns.
Apply the principle of least privilege for model tool access.

Read full technical analysis in Appendix 04 →

LLM02

Sensitive Info Disclosure

Description: Accidental revelation of confidential data, PII, or proprietary secrets through model outputs (OWASP Foundation, 2025). This can occur if sensitive data is included in training sets or if a model is manipulated into revealing its internal configuration or user-specific data.

Key Remediations:

Sanitize and scrub sensitive content from training datasets.
Implement robust input validation to detect extraction attempts.
Use privacy-preserving techniques like differential privacy or federated learning.
Maintain clear user policies on data usage and retention.

Read full technical analysis in Appendix 05 →

LLM03

Supply Chain Risks

Description: Vulnerabilities arising from third-party components, datasets, and pre-trained models (OWASP Foundation, 2025). Compromises at any point in the supply chain—from data scrapers to model hubs—can introduce hidden backdoors, biases, or malware.

Key Remediations:

Rigorously vet all third-party data and pre-trained model sources.
Maintain a Software Bill of Materials (SBOM) to track dependencies.
Verify component integrity using cryptographic hashes and signatures.
Conduct regular AI Red Teaming on integrated components.

Read full technical analysis in Appendix 06 →

LLM04

Model Poisoning

Description: Intentionally corrupting training data or fine-tuning processes to introduce vulnerabilities, backdoors, or biases. This compromises the model's fundamental reasoning and can be used to bypass security controls or spread misinformation (OWASP Foundation, 2025).

Key Remediations:

Strictly track and verify the lineage of all training data.
Implement anomaly detection to identify malicious patterns in datasets.
Use isolated sandboxing for all training and fine-tuning processes.
Rigorously vet data vendors and third-party dataset suppliers.

Read full technical analysis in Appendix 07 →

LLM05

Improper Output Handling

Description: Failure to validate, sanitize, and handle LLM-generated content before passing it to downstream systems (OWASP Foundation, 2025). This can enable traditional attacks like XSS, SQL injection, and Remote Code Execution (RCE) via the model's output.

Key Remediations:

Apply Zero-Trust principles: treat model output as untrusted user input.
Use context-aware encoding (e.g., HTML escaping, parameterized SQL queries).
Employ strict Content Security Policies (CSP) to mitigate XSS risks.
Sanitize all responses before passing them to backend functions or shells.

Read full technical analysis in Appendix 08 →

LLM06

Excessive Agency

Description: Granting models too much autonomy or access to sensitive functions (OWASP Foundation, 2025). When a model can call tools or APIs without sufficient oversight, unexpected or manipulated outputs can lead to damaging real-world actions.

Key Remediations:

Enforce the Principle of Least Privilege for all model-accessible tools.
Implement mandatory human-in-the-loop approval for high-impact actions.
Prefer specific, granular functions over broad, open-ended ones.
Perform authorization checks in downstream systems, not just the LLM.

Read full technical analysis in Appendix 09 →

LLM07

System Prompt Leakage

Description: Techniques used to extract the underlying system instructions (system prompts) that govern LLM behavior (OWASP Foundation, 2025). Disclosure reveals internal guardrails and logic, which can then be bypassed by attackers.

Key Remediations:

Externalize credentials—never include secrets or keys in system prompts.
Use external guardrails to filter output for leaked system instructions.
Implement multiple specialized agents with restricted, narrow prompts.
Do not rely solely on system prompts for security; enforce independent controls.

Read full technical analysis in Appendix 10 →

LLM08

Vector Weaknesses

Description: Security risks in systems using Retrieval Augmented Generation (RAG) (OWASP Foundation, 2025). Weaknesses in how vectors are generated, stored, or retrieved can lead to unauthorized data access or the injection of harmful knowledge; research has shown that injecting as few as five poisoned documents can dominate retrieval results (Zou et al., 2024).

Key Remediations:

Implement permission-aware retrieval within the vector database.
Validate and sanitize all documents before they are embedded.
Ensure logical partitioning between different users in shared vector stores.
Maintain immutable audit logs of all data retrieval activities.

Read full technical analysis in Appendix 11 →

LLM09

Misinformation

Description: The generation of factually incorrect or "hallucinated" information (OWASP Foundation, 2025). Over-reliance on unverified AI content can lead to security breaches, legal liability, and reputational damage.

Key Remediations:

Use RAG to ground model responses in verified, factual external data.
Implement mandatory human fact-checking for high-stakes information.
Clearly label AI-generated content and communicate model limitations.
Use automated tools to identify and flag potential fabricated content.

Read full technical analysis in Appendix 12 →

LLM10

Unbounded Consumption

Description: Resource exhaustion attacks (DoS) or "Denial of Wallet" via excessive inference requests (OWASP Foundation, 2025). High computational costs make LLMs targets for crashing services or depleting budgets.

Key Remediations:

Implement strict rate limiting and token quotas per user.
Enforce maximum input lengths and context window constraints.
Set hard budget caps and monitor resource consumption patterns.
Continuous resource monitoring to detect anomalous usage patterns.

Read full technical analysis in Appendix 13 →

Reflections

The research phase has been a critical exercise in understanding the vast and rapidly evolving field of security. By focusing on the OWASP Top 10 for LLMs, I was able to move beyond anecdotal "jailbreaks" and towards a professional taxonomy of risk. This process has highlighted that while many AI vulnerabilities are new, they often stem from age-old security failures—such as the lack of separation between data and instructions. Conducting this research has reinforced my ability to evaluate emerging technologies through a structured, risk-based lens.

One of the most interesting takeaways from researching the OWASP Top 10 LLM vulnerabilities has been how many of the weaknesses within AI systems stem from traditional security issues seen in web applications, such as injection attacks and improper input handling. However, what separates LLM vulnerabilities is that these attacks target not just systems, but reasoning itself. Prompt injection, for example, is essentially a form of social engineering directed at the model rather than a human. This emphasises that securing AI systems is not only about technical controls, but also about understanding how models interpret and prioritize instructions. As a result, defensive strategies must adapt to combine typical security principles with new approaches tailored to probabilistic and context-driven systems.

The research also reinforced the importance of defense-in-depth. No single mitigation—such as system prompts or output filtering—is sufficient on its own. Effective protection requires multiple overlapping controls, including input validation, access restrictions, monitoring, and human oversight.

Next Step: Architectural Analysis

After exploring the individual vulnerabilities, we can now synthesize these findings into a holistic view of the GenAI attack surface and its technical root causes.

View Vulnerability Analysis →