Breaking My Own Infrastructure: 12 Days, 19 Findings, 3 False Positives
Twelve days ago I opened a terminal, pointed curl at our staging API, and started breaking things. I didn’t have a plan. I didn’t have a timeline. I just had coffee, paranoia, and a vague sense that “it probably works fine” wasn’t good enough anymore.
This is what I found.
The Series at a Glance
What started as “let me test this one form” turned into a full infrastructure security audit spanning the API layer, AWS IAM, S3 bucket policies, Cognito Identity Pools, CORS configuration, refresh token handling, TypeORM data leakage, and a Lambda function nobody remembered existed.
| # | Post | What I Found |
|---|---|---|
| 1 | From DevOps to DevSecOps | Why I started doing this in the first place |
| 2 | 5,000 Attack Vectors Later | Fuzzing every input field – scientific notation creates $730K invoices |
| 3 | Two curl Commands to Full S3 Access | Cognito handing out S3 credentials to anyone on the internet |
| 4 | The CORS Rabbit Hole | Wildcard subdomain matching in CORS configuration |
| 5 | Internal Fields Aren’t Internal | Mass assignment – I deleted my own account via profile update |
| 6 | XSS in Support Tickets | Unescaped payloads rendering on the agent dashboard |
| 7 | 3 AM Thousand Invoices | Unauthenticated invoice creation with no rate limiting |
| 8 | The Refresh Token That Wouldn’t Die | Old tokens never invalidated – 30 days of permanent access |
| 9 | I Can Read Everyone’s Invoices | Finance BOLA + auto_login token leak via raw entity joins |
| 10 | The Load Balancer That Trusted Everyone | X-Forwarded-For bypass – rate limiting protects nobody |
| 11 | What –dryrun Taught Me | Three false positives and the testing methodology I built after |
By the Numbers
- Endpoints tested: 377+
- Unique attack payloads: ~5,000
- Critical findings: 4 (Cognito S3 access, unauthenticated invoice creation, public bucket listings, finance BOLA with auto_login leak)
- High findings: 7 (mass assignment, Lambda signed-URL endpoint, missing auth on endpoints, CORS misconfiguration, refresh token reuse, X-Forwarded-For rate limit bypass, QB data leaked via addSelect)
- Medium findings: 6 (XSS in contact forms, input validation gaps, error message disclosure, missing rate limiting, sequential ID enumeration, multipart upload abuse)
- False positives caught: 3 (frontend bucket write, presigned URL bypass, unnecessary deny statements)
- Terraform files modified: 8
- Cups of coffee: lost count
- Hours of sleep lost: also lost count
What Actually Mattered
Not all findings are equal. Here’s what I’d fix first if I had to do it again:
Fix Immediately (same day)
1. Cognito S3 full access. Two curl commands to download 2.4 GB of confidential files. This was the worst finding – not because it was technically sophisticated, but because it had been there for over a year and nobody noticed. The fix was removing GetObject and DeleteObject from the Cognito IAM role. Thirty minutes.
2. Public bucket directory listings. Six buckets responding to ?list-type=2 with full file inventories. No credentials needed. Search engines like GrayhatWarfare index these automatically. The fix was a DenyPublicListBucket statement using aws:PrincipalAccount condition. One PR for all six buckets.
3. Unauthenticated invoice creation. A missing @Authorized() decorator meant anyone could create invoices via curl. Combined with scientific notation in the quantities field (1e3 = 1000 trophies), this was a financial risk. The fix was one decorator and proper input validation.
4. Finance BOLA + auto_login leak. Any authenticated user could read any finance record by sequential ID. Each record included the full QB invoice AND the raw User entity – including the auto_login passwordless auth token. Three layers of protection existed (select: false, @Exclude(), response DTO) and all three were bypassed.
Fix This Week
5. Mass assignment on profile update. The PUT endpoint accepted any field in the request body, including isDeleted, role, and isVerified. I soft-deleted my own account by sending {"isDeleted": true}. The fix was explicit DTOs with @Exclude() on internal fields.
6. Lambda signed-URL endpoint. After locking down S3, I found a Lambda behind API Gateway that generates CloudFront signed download URLs for anyone with Cognito credentials. No authorization check. No file ownership validation. This is the remaining read path for student PII files. The fix is in the Lambda code, not in S3 or IAM.
7. Refresh token reuse. Old refresh tokens are never invalidated. The same token can generate unlimited new access tokens for 30 days. Password changes don’t revoke existing tokens. One stolen token = 30 days of persistent access. The fix is refresh token rotation with family tracking.
8. X-Forwarded-For rate limit bypass. The ALB is directly accessible from the internet, bypassing CloudFront. With trust proxy: 2 and only 1 actual proxy hop, Express trusts the attacker’s X-Forwarded-For header. Rotating fake IPs bypasses all rate limiting. The fix is closing the ALB security group to CloudFront-only traffic.
Fix This Sprint
9. CORS wildcard subdomain matching. The API reflects any Origin matching *.example.com with Access-Control-Allow-Credentials: true. An attacker who compromises any subdomain (or registers a lookalike) can steal authenticated API responses.
10. Rate limiting. Multiple endpoints accept unlimited requests. The invoice endpoint is the most dangerous, but several others (contact form, login, password reset) also lack throttling.
What I Got Wrong
Three false positives in nine days. That’s a 19% false positive rate on my initial findings. Not great.
Every one had the same root cause: I tested the setup step and assumed the execution step would follow.
--dryrunchecks syntax, not IAM permissionsaws s3 presigndoes local HMAC math, not an API call- Adding deny statements where IAM already denies is complexity for free
I wrote a whole post about this (What –dryrun Taught Me) because it was the most important lesson of the entire engagement. A false positive erodes trust faster than a missed finding.
My new rule: every finding needs proof-of-execution, not proof-of-setup.
The AWS Permission Model (The Hard Way)
The Cognito/S3 investigation alone taught me more about AWS permissions than years of reading documentation:
-
IAM policy + bucket policy = two independent gates. Either one can allow an action. Locking down one without the other leaves the door open.
-
Cognito enhanced flow applies a session scope-down (
AmazonCognitoUnAuthedIdentitiesSessionPolicy) that blocks S3 entirely from IAM policies. Your IAM grants are silently ignored. -
Bucket policies bypass the session scope-down. Resource-based policies are evaluated separately from identity-based policies. This is why the student bucket still worked after I locked down the IAM role.
-
Condition keys are policy-type-specific.
s3:content-typeworks in IAM policies but not bucket policies.s3:prefixworks withListBucketbut notListBucketMultipartUploads. Same syntax, different rules. -
CloudFront is not a security boundary unless you configure OAC. A “custom origin” pointing to an S3 website endpoint is just a CDN in front of a public bucket.
These aren’t in most tutorials. I learned them by breaking things and reading Terraform error messages at 1 AM.
The Pattern
Looking back at twelve days of testing, there’s a pattern to how these vulnerabilities existed:
The code worked. Every single vulnerable endpoint did what it was supposed to do. Files uploaded. Invoices created. Profiles updated. Forms submitted. The developers weren’t wrong – they built working features.
Nobody tested the negative case. “What happens if I call this without logging in?” “What happens if I send 1e6 as a quantity?” “What happens if I include isDeleted in the request body?” These questions weren’t asked because the happy path worked.
Infrastructure and application security were treated as separate concerns. The API team secured the API. The infrastructure team secured the infrastructure. But the Cognito IAM policy sat at the intersection – and nobody reviewed it because it was “infrastructure” to the developers and “just for uploads” to the DevOps team.
Security controls were declared but not enforced. select: false was overridden by .addSelect(). @Exclude() was ignored because classTransformer wasn’t enabled. The CloudFront-only security group rule was drowned out by 0.0.0.0/0 rules. Three layers of protection on the finance data, all three bypassed. The pattern repeated everywhere: the intention was right, the enforcement was absent.
What Changed
After twelve days of pentesting:
- S3 buckets are scoped to known prefixes with content-type restrictions
- Public bucket listings are blocked across all environments
- Three endpoints got
@Authorized()decorators - Input validation was tightened on financial endpoints
- DTOs explicitly exclude internal fields
- I have a Claude Code skill with 350+ testing techniques for future pentests
- The team now asks “what if someone calls this without auth?” during code reviews
What’s Still Open
I’m being honest about what’s not fixed yet:
- Finance BOLA + auto_login leak – any auth user can read any finance record and extract passwordless auth tokens
- Refresh token rotation – old tokens never invalidated, 30-day persistent access from stolen tokens
- ALB security group – direct access bypasses CloudFront, nullifying rate limiting via X-Forwarded-For
- Lambda signed-URL endpoint needs authorization logic (the real remaining read path)
- Presigned URL migration to eliminate Cognito credentials in the browser entirely
- CloudFront OAC migration to stop serving the frontend bucket as a public S3 website
- Rate limiting across all public endpoints
- JWT claims – missing
aud/issmeans tokens from any frontend work on all frontends - KMS permissions on the Cognito role (GenerateDataKey/Decrypt work but are unexploitable without GetObject)
Security isn’t a checkbox. It’s a backlog. The difference between “secure” and “insecure” isn’t whether the backlog is empty – it’s whether you know what’s on it.
For Other DevOps Engineers
If you’re thinking about making the shift to DevSecOps, or even just starting to test your own infrastructure:
-
Start with
curl. No tools. No scanners. Justcurl -s -X POSTand see what happens. You’ll be surprised. -
Test staging, not production. I cannot stress this enough. Everything in this series was tested against staging. The one time I accidentally pointed at production, I stopped immediately.
-
Document everything, including mistakes. My false positives are in the report. My corrections are timestamped. The team trusts the report more because of the corrections, not less.
-
Fix incrementally. Phase 0 took 30 minutes and stopped the worst-case scenario. The full fix took a week. Ship the 30-minute fix first.
-
Check both sides. IAM policy AND bucket policy. Authentication AND authorization. Input validation AND output encoding. The vulnerability is always at the intersection.
The Linux Connection
If you’re new to security and want to understand the foundation that all of this sits on, I wrote a guide on Linux file permissions last year. The concepts are the same at every level – users, groups, permissions, the principle of least privilege. AWS IAM is just Linux permissions with more YAML.
What’s Next
I’m not stopping. The landscape changes too fast. AI is generating exploit scripts now. The window between “vulnerability introduced” and “vulnerability exploited” is shrinking.
But I’m also not panicking. Because now I have a methodology. I have a skill file with 350+ techniques. I have a team that takes security seriously. And I have a blog where I document everything – including the parts where I look silly.
That’s the real shift from DevOps to DevSecOps. Not the tools. Not the certifications. The willingness to break your own stuff, admit what you got wrong, and fix it before someone else finds it.
Now go run curl -s "https://YOUR-BUCKET.s3.amazonaws.com/?list-type=2" against your own buckets. I’ll wait.
This is the final post in the Breaking My Own Infrastructure series. If you’ve been following along, thank you. If you just found this, start from Part 1.