Make Claude Opus Punch Above Its Weight: Security Testing With Skills, Subagents and Smart Workflows

You do not need Claude's newest or largest model to do serious security work. With skills, parallel subagents, adversarial verification and the right tooling, a widely available model like Opus produces security analysis that rivals a top-tier frontier model. A practitioner's guide to the building blocks, the workflow, the open Trail of Bits skills, and the guardrails that keep the agent from becoming the vulnerability.

AI coding assistants have quietly become security tools. Used well, a model like Claude reads a whole codebase, reasons about how an attacker would abuse it, drives a static-analysis engine, and writes the fix, all in one session. Used carelessly, the same agent is tricked by one poisoned comment into running a command it never should. This guide is the setup that gets real value out of Claude for testing and review, and the guardrails that keep the agent itself from becoming the vulnerability.

The counterintuitive part first: the biggest lever is not which model you run, it is how you run it. A widely available model such as Claude Opus, placed in the right harness, decomposed tasks, parallel subagents, adversarial verification and real tooling, produces analysis people assume only the largest frontier model could. Everything below is that harness.

On this page: Why use Claude for security · The five building blocks · Install the skills (tutorial) · Anthropic's marketplace · The review workflow · The Trail of Bits blueprint · Guardrails that matter · Prompt injection · Using it responsibly · Watch · FAQ · Sources

At a glance

Setup beats the model. Orchestration, not raw horsepower, is the multiplier. Opus in this harness rivals the biggest frontier model used as a one-shot chatbot.
Learn five primitives and you can build almost any review workflow: skills, plugins, subagents, MCP tools and hooks.
The winning pattern is parallel review plus adversarial verification: many subagents hunt, independent agents try to disprove each finding before it is reported.
Trail of Bits ships an open, installable marketplace of security skills you can add in one command (below).
No single control stops prompt injection (OWASP's number-one LLM risk). Treat every file, ticket and tool response as attacker-controlled.

Why use Claude for security work

Traditional tooling is good at patterns and bad at context. A scanner says a function calls a dangerous routine; it cannot tell you whether the input reaching it is attacker-controlled three files away. Claude can hold that context, follow the data flow, and reason about exploitability the way a human reviewer does, only faster and across more files at once.

Reach for it when you need to:

Review code for vulnerabilities with full cross-file context
Threat-model a design or map trust boundaries and data flows
Triage and de-duplicate noisy scanner output
Write or refine detection and static-analysis rules
Draft a proof-of-concept test for a confirmed bug
Explain a finding clearly to the team that has to fix it

The shift that matters is chat to agent: Claude does not just answer, it runs tools, reads files and iterates. That is what makes it powerful for testing, and exactly why the rest of this guide spends as long on containment as on capability.

The five building blocks

These primitives do almost all the work. Learn what each is for and your setup stays clean and auditable.

Building block	What it is	Security use
Skills	Reusable capability modules in a SKILL.md file, auto-invoked when relevant or called like a command	Package a repeatable procedure once: a review checklist, a triage workflow, a rule-writing routine
Plugins & marketplaces	A bundle shipping skills, subagents, MCP servers and hooks together, installed from a marketplace	Distribute a whole security toolkit to a team in one install, namespaced so nothing collides
Subagents	Specialised agents with their own context window, system prompt and scoped tools	Run one reviewer per file or per vulnerability class in parallel; spawn a separate agent to attack a fix
MCP servers	The open Model Context Protocol, connecting external tools and data	Let Claude drive real tooling: a scanner, a browser, a ticket system, a database
Hooks	Code that fires at lifecycle points, such as before and after a tool runs	Enforce policy in code: block a dangerous command, redact secrets, audit every change

In one line: skills are knowledge and procedure, MCP brings tools and data, subagents give parallelism and isolation, hooks are enforcement, and plugins package and share all of it. The Claude Agent SDK (Python and TypeScript) exposes the same primitives for building a standalone pipeline.

Tutorial: install the security skills

Everything in this guide is open source and installs from inside Claude Code. The flow is always the same: add a marketplace once, then install the plugins you want. Three steps, end to end.

Add the Trail of Bits marketplace — the security-skills marketplace this whole guide is built around. Paste this into Claude Code (you only do it once):
/plugin marketplace add trailofbits/skills
That registers the marketplace from github.com/trailofbits/skills.
See what is inside it. Open the menu and choose "Browse and install plugins" to scroll all 40-plus skills:
/plugin menu
Install the skills you want by name (table below), then confirm they loaded:
/plugin list

These are the five skills this guide refers to. Click any name to read its source before you install it, then copy the command on the right.

Skill	What it does	Install command
differential-review	Reviews a set of code changes for risky additions, focused on what matters most: authentication, cryptography, value transfer and external calls.	/plugin install differential-review@trailofbits
static-analysis	Lets Claude drive Semgrep and CodeQL and interpret the results for you, instead of reading raw scanner output.	/plugin install static-analysis@trailofbits
semgrep-rule-creator	Helps you write and refine a custom Semgrep rule for a pattern you have just found.	/plugin install semgrep-rule-creator@trailofbits
variant-analysis	Takes one confirmed bug and hunts the codebase for the same mistake elsewhere, where a lot of real risk hides.	/plugin install variant-analysis@trailofbits
mutation-testing	Pressure-tests your test suite by mutating the code, checking the tests actually catch the bug they claim to.	/plugin install mutation-testing@trailofbits

Browse the full set, with source for every plugin, in the trailofbits/skills marketplace.

Prefer Anthropic's own marketplace?

Claude Code ships with Anthropic's official marketplace already switched on, so there is nothing to add. Browse it two ways:

In the terminal — run /plugin and open the Discover tab.
On the web — the full catalogue is at claude.com/plugins (source for every plugin: anthropics/claude-plugins-official).

Install anything from it by name:

/plugin install <plugin-name>@claude-plugins-official

One rule before you install anything: a skill is code, and a marketplace is only as trustworthy as its publisher. Open the skill's source and read it first — researchers have demonstrated malicious skills that hide instructions inside their own documentation.

The review workflow, step by step

The teams getting the most out of Claude do not prompt "find all the bugs." They run this loop.

Scope and authorise. Define what is in bounds, point the agent at the right target, and drop the context it needs (architecture, threat model, prior findings) into a project instruction file so it is not guessing.
Build context before hunting. Have the agent map the codebase, trust boundaries and data flows first. A finding is only as good as the reviewer's grasp of how input reaches the sink.
Fan out in parallel. Use subagents to cover ground at once, one per file, service or vulnerability class (injection, auth, secrets, unsafe deserialisation). Route heavy scanning to a faster, cheaper model.
Verify adversarially. The single most important step. For each candidate finding, spawn an independent agent whose only job is to disprove it. Keep only findings that survive. This is what separates a usable report from a wall of false positives.
Fix, then re-verify the fix. Run a separate check that the patch closes the issue and introduces nothing new. "It changed the code" is not "it fixed the bug."
Keep a human in the loop. Reading and analysis can run freely; writing, deploying, deleting or sending data out should need review.

Illustration of several AI agents reviewing code blocks in parallel while one verifies a flagged finding — The highest-value pattern: many agents review in parallel, then an independent agent tries to disprove each finding before it is reported. Illustration.

Want this on every pull request? Claude Code ships a /security-review command and an official GitHub Action, claude-code-security-review, that comments findings straight on the PR. Caveat: run it only on PRs you trust. Processing untrusted external PR content can expose your CI secrets to prompt injection, so it is not a drop-in for arbitrary outside contributions without isolation.

Why these are the blueprint

The skills you installed above are not a toy demo, they are how professional auditors at Trail of Bits actually work, written down. They let Claude drive Semgrep and CodeQL and interpret the output, chase a confirmed bug's variants across the codebase, and pressure-test results before you trust them.

The lesson, even if you build your own: good AI security work is decomposed into narrow, verifiable steps with tool support, not handed to the model as one open-ended request.

New to the offensive side? Trail of Bits also maintains an open CTF Field Guide that is a solid on-ramp.

The guardrails that matter

An agent that runs commands and reads your whole tree is powerful and dangerous in equal measure. Claude Code ships layers of control; the right posture uses all of them together (defence in depth), because none is enough alone.

Control	What it does	Its limit
Permission modes & approval	Read-only tools run freely; commands and writes prompt for approval, with per-project allow rules	Approve-once rules can be over-broad; review what you allow-list
Command allow / deny lists	Match permitted commands by pattern; deny rules block outright	Deny rules have had documented (patched) bypasses, so do not treat them as a hard boundary
Sandbox	OS-level limits (Seatbelt on macOS, bubblewrap on Linux) confining the filesystem and denying network by default	Contains blast radius; does not stop the agent being misled into a permitted-but-harmful action
Hooks	Programmatic enforcement before and after tool use: block, redact, audit, require checks	Only as good as the rules you write
Managed settings	Org-wide rules local users cannot override	Needs central administration to be meaningful

A practical baseline for security work:

Grant the narrowest tool set the task needs
Turn on the sandbox so executed tooling cannot reach beyond the target
Allow-list the specific commands your workflow uses instead of opening up shell access
Add a hook that blocks destructive operations and flags any attempt to read secrets or reach the network
Treat skills and plugins as code: install from sources you trust and read them first

Prompt injection: the risk you cannot design away

Prompt injection is the defining security problem of agentic AI, and OWASP ranks it as the number-one risk for LLM applications. The attack is simple: someone hides instructions where the agent will read them, in a code comment, commit message, issue, web page or API response, and the agent treats them as if they came from you. In security testing this is acute, because you are deliberately pointing the agent at hostile material.

There is no single setting that eliminates it. The realistic defence is layered:

Treat all content as untrusted. Files, tickets, scanner output and especially MCP tool results can be attacker-controlled. The agent should analyse them, not obey them.
Least privilege. A reviewer that cannot write files or reach the network cannot be made to exfiltrate code.
Isolation and layered controls. Run in a sandbox, separate analysis from anything sensitive, assume no single line of defence holds, the defence-in-depth posture Anthropic recommends for agents.
Review actions, not just output. Approving each consequential action is the backstop when an injection slips through.

A useful mental model: the model is a brilliant analyst who will believe anything it reads. You get the analysis by giving it access; you stay safe by making sure even a fully convinced agent cannot do real harm.

Using it responsibly

Everything here is dual-use, so the line is authorisation, not technique. Keep to these rules:

Test only systems you own or have explicit written permission to test
Never point the agent at a live third-party target "to see what it finds"
Keep findings confidential and disclose them responsibly
Scope engagements clearly and log what the agent did
Keep a named person accountable for any action that changes a system

Inside those lines, Claude is a force multiplier for defenders. Outside them, it is simply unauthorised access with better tooling.

Watch

A short, official walk-through of finding and fixing security vulnerabilities with Claude.

Frequently asked questions

Can Claude replace a human security reviewer? No, and treating it that way is the mistake. It is a force multiplier: it covers more ground and drafts the tedious parts, while a human scopes the work, judges findings and owns decisions. The adversarial-verification step exists precisely because the model produces false positives a person must adjudicate.

What is the difference between a skill and a plugin? A skill is a single reusable capability (a procedure or checklist), invokable as a slash command. A plugin is a package that can bundle many skills plus subagents, MCP servers and hooks, distributed through a marketplace so a team installs the whole toolkit at once.

Is it safe to point Claude at malicious code or a hostile web app? Only inside containment. Assume the target will try to inject instructions. Run with least privilege, in a sandbox, with no write or network access it does not need, and review any action it tries to take.

Which model should I use? Match the model to the task: a high-capability model for deep architectural reasoning, a balanced one for routine review, a fast cheap one for parallel scanning and subagents. And do not assume you need the newest or biggest model, the orchestration in this guide closes more of the gap than a model-tier upgrade does.

Make Claude Opus Punch Above Its Weight: Security Testing With Skills, Subagents and Smart Workflows

Why use Claude for security work

The five building blocks

Tutorial: install the security skills

Prefer Anthropic's own marketplace?

The review workflow, step by step

Why these are the blueprint

The guardrails that matter

Prompt injection: the risk you cannot design away

Using it responsibly

Watch

Frequently asked questions

Sources

Related articles

The AI Friend in Your Pocket: The Hidden Privacy and Safety Risks of Companion Chatbot Apps

Coding With AI? Catch the Security Gaps Before You Ship

Public Wi-Fi in 2026: What's Actually Risky and What Isn't