The Black-Box Side of AI Security

March 15, 2026

In February 2026, Anthropic released Claude Code Security - a tool that found 500+ vulnerabilities in open-source codebases by reading source code, including 22 Firefox vulnerabilities in two weeks^[1]. That's white-box testing: you have the code, AI reviews it.

I wanted to see what happens from the other side. Black-box testing: no source code, no internal access. Just a URL and an AI that can reason about what it sees.

I built a Claude Code skill called hackprobe and pointed it at 17 SaaS platforms. It found 32 critical vulnerabilities, including 3 full account takeovers, at CVSS scores between 8.0 and 10.0.

White-box catches what's in your code. Black-box catches what's exposed to the internet. Both matter. Until now, only one had an AI-powered tool.

What we found

32 confirmed critical findings across 17 platforms. Every finding was verified with a proof-of-concept.

CVSS 10.0 - 3 findings. Full account takeover.
CVSS 9.5-9.8 - 5 findings. Mass data exposure.
CVSS 9.0-9.4 - 8 findings. Payment and credential exposure.
CVSS 8.5-8.9 - 10 findings. Subscription fraud, privilege escalation.
CVSS 8.0-8.4 - 6 findings. Email spam, enumeration.

Five patterns repeated across the majority of targets:

Production API keys in public JavaScript - 12/17 platforms (71%)
User emails leaked via HTTP Referer to ad trackers - 11/17 (65%)
Unauthenticated analytics event injection - 8/17 (47%)
CORS reflecting arbitrary origin with credentials - 7/17 (41%)
User PII indexed in public web archives - 5/17 (29%)

71% of platforms had production API keys sitting in their public JavaScript bundles. Not staging keys. Production keys that authenticate real backend API calls.

These aren't obscure, low-severity findings. 3 were full account takeovers at CVSS 10.0. 8 exposed payment infrastructure without authentication. 5 leaked personal data through public web archives that anyone can access.

The common thread: none of these are known CVEs. No scanner has templates for them. They're business logic flaws that require understanding what an API field means, not pattern matching against a signature database.

Two findings stuck with me.

"The server just... saves it"

The skill's secret scanner found API keys in a platform's public JavaScript bundle. That alone isn't unusual - 71% of the platforms had the same problem. But the AI Orchestrator chained it with another finding: the key authenticates an endpoint that accepts a billing status field from the client. There's no server-side validation. You send a request saying "I'm a paid subscriber" and the server just... saves it. Returns it back. Your account is now premium.

I tested it. Created an account with a throwaway email. Sent the request. Refreshed the page. Paid subscriber. The platform's entire subscription revenue model can be bypassed with one API call by anyone who opens DevTools.

This is the kind of vulnerability that no scanner will ever find. There's no signature for "the server trusts the client's billing status." An LLM reading the API response can reason about what that field means and why accepting it from the client is wrong.

Invoices in a public bucket

The skill's cloud storage scan found an S3 bucket with public read and write access. Inside: hundreds of customer payment invoices. Each one contained an email address, partial card number, expiry date, payment amount, and subscription details.

The read access is bad enough - that's a PCI-DSS violation and a mandatory GDPR breach notification. But the write access is worse. An attacker could replace invoice files with malicious content that gets served to paying customers. Think: a legitimate-looking invoice PDF that installs malware, delivered through the company's own infrastructure.

The bucket name was guessable - just the company name with a common suffix. The skill tries common naming patterns automatically.

The AI code generation problem

These findings exist in a context that makes them more urgent, not less.

41% of all code written in 2025 is AI-generated^[2]. Google confirmed that more than 30% of their new code is now AI-generated^[3]. 84% of developers use or plan to use AI coding tools, and 51% use them daily^[4].

But only 3% of developers highly trust AI output^[5]. And for good reason: approximately 48% of AI-generated code contains security vulnerabilities^[6].

Researchers at Stanford found that developers using AI coding assistants wrote insecure code more often - on 4 out of 5 tasks - and were simultaneously more confident that their code was secure^[7]. More bugs and more confidence. That's the dangerous combination.

Gartner predicts that by 2028, prompt-to-app approaches will increase software defects by 2500%^[8].

Meanwhile, only 17-21% of small businesses perform regular security assessments^[9]. The rest - roughly 80% - don't audit at all. The average data breach costs $4.44 million globally and $10.22 million in the United States^[10].

The developers generating the most code with AI are creating the most security debt - and they're the least likely to audit it.

How hackprobe works

hackprobe is a Claude Code skill that orchestrates 25+ security tools with LLM reasoning across 6 stages - from OSINT and recon through automated scanning, browser analysis, injection testing, and deep analysis. The final stage is the key: an AI Orchestrator that reads all results and chains findings into exploit paths. That's how it connects "API key in JS bundle" to "unvalidated billing endpoint" to "free subscription" - reasoning that no template-based scanner can do.

It also mines Wayback Machine and CommonCrawl as a reconnaissance source. Cached API responses from years ago often contain user identifiers and tokens that still resolve to live data. No existing security tool does this systematically.

Install the tools once, then run claude /hackprobe https://your-target.com. Full audit in about 15 minutes.

Limitations

CLI tools required. 25+ security tools via install.sh. The skill orchestrates real pentest tooling, not toy implementations.
Black-box only. Tests what an external attacker sees. No source code analysis.
Human review required. All findings should be verified before acting on them.
Single-target. One URL at a time.

The skill is open source

The irony is hard to miss. The developers who most need a security audit are the ones generating code with AI - and they're the same ones who can run this tool. A hackprobe run won't replace a professional penetration test. But it will catch the CVSS 8.0+ issues before they become a breach notification.

If 41% of code is AI-generated and 48% of it contains vulnerabilities, we're accumulating security debt faster than we can audit it. This is one approach to closing that gap. The skill is open source at github.com/nuromirzak/hackprobe.

This skill is for authorized security testing only. Run it only against applications you own or have explicit written permission to test. Unauthorized security testing is illegal in most jurisdictions.

[1] Anthropic, "Claude Code Security," February 2026. Found 500+ vulnerabilities in production open-source codebases, including 22 Firefox vulnerabilities in two weeks. anthropic.com/news/claude-code-security ↩

[2] GitHub internal research, aggregated by Tenet: 41% of code in 2025 is AI-generated. wearetenet.com ↩

[3] Sundar Pichai, Google/Alphabet Q3 2024 earnings call: ">25% of new code is AI-generated," updated to >30% by April 2025. Fortune ↩

[4] Stack Overflow 2025 Developer Survey, 49,000+ respondents across 177 countries: 84% use or plan to use AI tools, 51% use them daily. stackoverflow.blog ↩

[5] Stack Overflow 2025: Only 3% of developers "highly trust" AI output. 46% actively distrust it. survey.stackoverflow.co/2025/ai ↩

[6] Snyk 2026 Developer Security Report: approximately 48% of AI-generated code contains vulnerabilities. go.snyk.io ↩

[7] Perry, Srivastava, Kumar, Boneh, "Do Users Write More Insecure Code with AI Assistants?", ACM CCS '23, Stanford University. Participants with AI assistants wrote insecure solutions more often on 4/5 tasks and were more likely to believe their code was secure. arxiv.org/abs/2211.03622 ↩

[8] Gartner: "By 2028, prompt-to-app approaches will increase software defects by 2500%." armorcode.com ↩

[9] Only 17-21% of small businesses perform regular security assessments. StrongDM 2026, QualySec 2025. strongdm.com, qualysec.com ↩

[10] IBM "Cost of a Data Breach Report 2025": Global average $4.44M, US average $10.22M (all-time high). ibm.com/reports/data-breach ↩