Make Claude Opus Punch Above Its Weight: Security Testing With Skills, Subagents and Smart Workflows

You do not need Claude's newest or largest model to do serious security work. With skills, parallel subagents, adversarial verification and the right tooling, a widely available model like Opus produces security analysis that rivals a top-tier frontier model. A practitioner's guide to the building blocks, the workflow, the open Trail of Bits skills, and the guardrails that keep the agent from becoming the vulnerability.
AI coding assistants have quietly become security tools. Used well, a model like Claude reads a whole codebase, reasons about how an attacker would abuse it, drives a static-analysis engine, and writes the fix, all in one session. Used carelessly, the same agent is tricked by one poisoned comment into running a command it never should. This guide is the setup that gets real value out of Claude for testing and review, and the guardrails that keep the agent itself from becoming the vulnerability.
The counterintuitive part first: the biggest lever is not which model you run, it is how you run it. A widely available model such as Claude Opus, placed in the right harness, decomposed tasks, parallel subagents, adversarial verification and real tooling, produces analysis people assume only the largest frontier model could. Everything below is that harness.
- Setup beats the model. Orchestration, not raw horsepower, is the multiplier. Opus in this harness rivals the biggest frontier model used as a one-shot chatbot.
- Learn five primitives and you can build almost any review workflow: skills, plugins, subagents, MCP tools and hooks.
- The winning pattern is parallel review plus adversarial verification: many subagents hunt, independent agents try to disprove each finding before it is reported.
- Trail of Bits ships an open, installable marketplace of security skills you can add in one command (below).
- No single control stops prompt injection (OWASP's number-one LLM risk). Treat every file, ticket and tool response as attacker-controlled.
Why use Claude for security work
Traditional tooling is good at patterns and bad at context. A scanner says a function calls a dangerous routine; it cannot tell you whether the input reaching it is attacker-controlled three files away. Claude can hold that context, follow the data flow, and reason about exploitability the way a human reviewer does, only faster and across more files at once.
Reach for it when you need to:
- Review code for vulnerabilities with full cross-file context
- Threat-model a design or map trust boundaries and data flows
- Triage and de-duplicate noisy scanner output
- Write or refine detection and static-analysis rules
- Draft a proof-of-concept test for a confirmed bug
- Explain a finding clearly to the team that has to fix it
The five building blocks
These primitives do almost all the work. Learn what each is for and your setup stays clean and auditable.
| Building block | What it is | Security use |
|---|---|---|
| Skills | Reusable capability modules in a SKILL.md file, auto-invoked when relevant or called like a command | Package a repeatable procedure once: a review checklist, a triage workflow, a rule-writing routine |
| Plugins & marketplaces | A bundle shipping skills, subagents, MCP servers and hooks together, installed from a marketplace | Distribute a whole security toolkit to a team in one install, namespaced so nothing collides |
| Subagents | Specialised agents with their own context window, system prompt and scoped tools | Run one reviewer per file or per vulnerability class in parallel; spawn a separate agent to attack a fix |
| MCP servers | The open Model Context Protocol, connecting external tools and data | Let Claude drive real tooling: a scanner, a browser, a ticket system, a database |
| Hooks | Code that fires at lifecycle points, such as before and after a tool runs | Enforce policy in code: block a dangerous command, redact secrets, audit every change |
In one line: skills are knowledge and procedure, MCP brings tools and data, subagents give parallelism and isolation, hooks are enforcement, and plugins package and share all of it. The Claude Agent SDK (Python and TypeScript) exposes the same primitives for building a standalone pipeline.
Tutorial: install the security skills
Everything in this guide is open source and installs from inside Claude Code. The flow is always the same: add a marketplace once, then install the plugins you want. Three steps, end to end.
- Add the Trail of Bits marketplace — the security-skills marketplace this whole guide is built around. Paste this into Claude Code (you only do it once):
/plugin marketplace add trailofbits/skillsThat registers the marketplace from github.com/trailofbits/skills.
- See what is inside it. Open the menu and choose "Browse and install plugins" to scroll all 40-plus skills:
/plugin menu
- Install the skills you want by name (table below), then confirm they loaded:
/plugin list
These are the five skills this guide refers to. Click any name to read its source before you install it, then copy the command on the right.
| Skill | What it does | Install command |
|---|---|---|
| differential-review | Reviews a set of code changes for risky additions, focused on what matters most: authentication, cryptography, value transfer and external calls. | /plugin install differential-review@trailofbits |
| static-analysis | Lets Claude drive Semgrep and CodeQL and interpret the results for you, instead of reading raw scanner output. | /plugin install static-analysis@trailofbits |
| semgrep-rule-creator | Helps you write and refine a custom Semgrep rule for a pattern you have just found. | /plugin install semgrep-rule-creator@trailofbits |
| variant-analysis | Takes one confirmed bug and hunts the codebase for the same mistake elsewhere, where a lot of real risk hides. | /plugin install variant-analysis@trailofbits |
| mutation-testing | Pressure-tests your test suite by mutating the code, checking the tests actually catch the bug they claim to. | /plugin install mutation-testing@trailofbits |
Browse the full set, with source for every plugin, in the trailofbits/skills marketplace.
Prefer Anthropic's own marketplace?
Claude Code ships with Anthropic's official marketplace already switched on, so there is nothing to add. Browse it two ways:
- In the terminal — run /plugin and open the Discover tab.
- On the web — the full catalogue is at claude.com/plugins (source for every plugin: anthropics/claude-plugins-official).
Install anything from it by name:
The review workflow, step by step
The teams getting the most out of Claude do not prompt "find all the bugs." They run this loop.
- Scope and authorise. Define what is in bounds, point the agent at the right target, and drop the context it needs (architecture, threat model, prior findings) into a project instruction file so it is not guessing.
- Build context before hunting. Have the agent map the codebase, trust boundaries and data flows first. A finding is only as good as the reviewer's grasp of how input reaches the sink.
- Fan out in parallel. Use subagents to cover ground at once, one per file, service or vulnerability class (injection, auth, secrets, unsafe deserialisation). Route heavy scanning to a faster, cheaper model.
- Verify adversarially. The single most important step. For each candidate finding, spawn an independent agent whose only job is to disprove it. Keep only findings that survive. This is what separates a usable report from a wall of false positives.
- Fix, then re-verify the fix. Run a separate check that the patch closes the issue and introduces nothing new. "It changed the code" is not "it fixed the bug."
- Keep a human in the loop. Reading and analysis can run freely; writing, deploying, deleting or sending data out should need review.

Why these are the blueprint
The skills you installed above are not a toy demo, they are how professional auditors at Trail of Bits actually work, written down. They let Claude drive Semgrep and CodeQL and interpret the output, chase a confirmed bug's variants across the codebase, and pressure-test results before you trust them.
New to the offensive side? Trail of Bits also maintains an open CTF Field Guide that is a solid on-ramp.
The guardrails that matter
An agent that runs commands and reads your whole tree is powerful and dangerous in equal measure. Claude Code ships layers of control; the right posture uses all of them together (defence in depth), because none is enough alone.
| Control | What it does | Its limit |
|---|---|---|
| Permission modes & approval | Read-only tools run freely; commands and writes prompt for approval, with per-project allow rules | Approve-once rules can be over-broad; review what you allow-list |
| Command allow / deny lists | Match permitted commands by pattern; deny rules block outright | Deny rules have had documented (patched) bypasses, so do not treat them as a hard boundary |
| Sandbox | OS-level limits (Seatbelt on macOS, bubblewrap on Linux) confining the filesystem and denying network by default | Contains blast radius; does not stop the agent being misled into a permitted-but-harmful action |
| Hooks | Programmatic enforcement before and after tool use: block, redact, audit, require checks | Only as good as the rules you write |
| Managed settings | Org-wide rules local users cannot override | Needs central administration to be meaningful |
A practical baseline for security work:
- Grant the narrowest tool set the task needs
- Turn on the sandbox so executed tooling cannot reach beyond the target
- Allow-list the specific commands your workflow uses instead of opening up shell access
- Add a hook that blocks destructive operations and flags any attempt to read secrets or reach the network
- Treat skills and plugins as code: install from sources you trust and read them first
Prompt injection: the risk you cannot design away
Prompt injection is the defining security problem of agentic AI, and OWASP ranks it as the number-one risk for LLM applications. The attack is simple: someone hides instructions where the agent will read them, in a code comment, commit message, issue, web page or API response, and the agent treats them as if they came from you. In security testing this is acute, because you are deliberately pointing the agent at hostile material.
There is no single setting that eliminates it. The realistic defence is layered:
- Treat all content as untrusted. Files, tickets, scanner output and especially MCP tool results can be attacker-controlled. The agent should analyse them, not obey them.
- Least privilege. A reviewer that cannot write files or reach the network cannot be made to exfiltrate code.
- Isolation and layered controls. Run in a sandbox, separate analysis from anything sensitive, assume no single line of defence holds, the defence-in-depth posture Anthropic recommends for agents.
- Review actions, not just output. Approving each consequential action is the backstop when an injection slips through.
Using it responsibly
Everything here is dual-use, so the line is authorisation, not technique. Keep to these rules:
- Test only systems you own or have explicit written permission to test
- Never point the agent at a live third-party target "to see what it finds"
- Keep findings confidential and disclose them responsibly
- Scope engagements clearly and log what the agent did
- Keep a named person accountable for any action that changes a system
Inside those lines, Claude is a force multiplier for defenders. Outside them, it is simply unauthorised access with better tooling.
Watch
A short, official walk-through of finding and fixing security vulnerabilities with Claude.
Frequently asked questions
Can Claude replace a human security reviewer? No, and treating it that way is the mistake. It is a force multiplier: it covers more ground and drafts the tedious parts, while a human scopes the work, judges findings and owns decisions. The adversarial-verification step exists precisely because the model produces false positives a person must adjudicate.
What is the difference between a skill and a plugin? A skill is a single reusable capability (a procedure or checklist), invokable as a slash command. A plugin is a package that can bundle many skills plus subagents, MCP servers and hooks, distributed through a marketplace so a team installs the whole toolkit at once.
Is it safe to point Claude at malicious code or a hostile web app? Only inside containment. Assume the target will try to inject instructions. Run with least privilege, in a sandbox, with no write or network access it does not need, and review any action it tries to take.
Which model should I use? Match the model to the task: a high-capability model for deep architectural reasoning, a balanced one for routine review, a fast cheap one for parallel scanning and subagents. And do not assume you need the newest or biggest model, the orchestration in this guide closes more of the gap than a model-tier upgrade does.
Sources
- Claude Code documentation — Skills
- Claude Code documentation — Plugins and marketplaces
- Claude Code documentation — Subagents
- Claude Code documentation — Hooks
- Anthropic Engineering — Claude Code sandboxing and permissions
- Claude Code documentation — Code review and the security-review GitHub Action
- Claude Agent SDK — Overview (Python and TypeScript)
- Anthropic — A framework for safe and trustworthy agents (defence in depth)
- Trail of Bits — Claude Code skills for security research and audit (open source, install via /plugin marketplace add trailofbits/skills)
- Anthropic — official Claude Code plugin marketplace (claude.com/plugins)
- OWASP — Top 10 for LLM Applications (prompt injection is LLM01)