v0.1.0 · closed alpha / last benchmark 2026-05-05 / 7 / 10 long-tail · published

Scrapers that don't break
when the site changes ships a redesign.

A pay-as-you-go product-extraction API for the long tail of e-commerce. Adaptive selectors and per-host packs survive template drift — so your Tuesday-morning Slack stops filling up with "scraper broke again."

Get an API key Read the docs free tier · 1,000 req/mo · no card
~/scrape · POST /v1/scrape
↑↓
# one URL → structured product JSON. tier 1 is enough 80% of the time.
$ curl -X POST https://api.triatomine.com/v1/scrape \
  -H "X-API-Key: tri_a4f9_••••••" \
  -d '{"url": "https://allbirds.com/products/mens-wool-runners"}'

{
  "url": "https://allbirds.com/products/mens-wool-runners",
  "tier": 1// http only
  "elapsed_ms": 812,
  "product": {
    "name": "Men's Wool Runners — Natural Black",
    "sku": "WR-M-NB-10",
    "price": { "amount": 98.00, "currency": "USD" },
    "availability": "InStock",
    "brand": "Allbirds",
    "image": "https://cdn.shopify.com/.../wr-m-nb.jpg"
  },
  "selector_source": "jsonld"// → microdata → meta → sitepack → llm
  "metered": true,  "cost_usd": 0.001
}
Order
Hemiptera
Family
Reduviidae
Subfamily
Triatominae
Habit
nocturnal · patient · adaptive
SPECIMEN 01
curl_cffi· selectolax· schema.org / JSON-LD· microdata· __NEXT_DATA__· playwright· patchright + browserforge· adaptive relocator· per-host sitepacks (YAML)· ollama qwen2.5 fallback· per-host kill-switch· curl_cffi· selectolax· schema.org / JSON-LD· microdata· __NEXT_DATA__· playwright· patchright + browserforge· adaptive relocator· per-host sitepacks (YAML)· ollama qwen2.5 fallback· per-host kill-switch·
01 · The problem

Most scraping providers will quietly fail you the day a site rebuilds its product page.

Scrapers are written against a CSS selector. The site ships a redesign. Your selector matches nothing. The provider returns a 200 with empty fields. Your dashboard fills with $0 prices and "out of stock" everywhere. Nobody pages you because the API didn't error — it just lied politely.

"We were paying ScrapingBee $499/mo. They claim 99% success. Then Brand X redesigned their PDP. Our pricing dashboard read $0.00 for nine days before anyone noticed. Nobody errored — every request was a 200."

— Pricing analyst, ~30-person DTC. Triatomine alpha customer #2.
brandx.com / pdp.template.liquid +1 / −1
04 <div class="product-detail">
05 <span class="price js-price">$98.00</span>
05 <span data-pricing="current">$98.00</span>
06 <p class="availability">In stock</p>
07 </div>
Most providers
selector ".js-price" → ∅
return 200, empty payload, charge you.
Triatomine
relocator → siblings + JSON-LD
re-pins the field, repacks the YAML.
02 · Tiered cascade

Cheap when the site is easy. Premium only when the site fights back.

Every request walks the cheapest tier first. Tier 1 is plain HTTP — fast, $0.001. We escalate to Tier 2 (headless) only if Tier 1 can't extract a Schema.org Product. Tier 3 (stealth) only if Tier 2 is blocked. You're billed for the tier that succeeded; failures are never metered.

Tier · Stack
Handles
Avg latency
Per request
01
HTTP
curl_cffi · selectolax · JSON-LD
~80% of e-commerce. Static HTML, sitemap walks, server-rendered Schema.org Product.
$0.001per success
02
Headless
Playwright · network-idle waits
Hydrated SPAs and catalogs that load price client-side. Triggered when Tier 1 returns no product signal.
$0.004per success
03
Stealth
patchright · browserforge fingerprints
Hosts behind Cloudflare, Akamai, PerimeterX/HUMAN. Async-only. Forced when Tier 2 trips a challenge.
$0.015per success
04
Real Chrome
CDP attach to user profile
Logged-in pages and the most fortified hosts. Out of scope for v1 — premium gated, manual approval. Coming v2.
v2 stretch
FAILURE POLICY
Failure = HTTP ≥500, anti-bot block, or no Schema.org Product extracted. Failures are never metered. You see {"status":"failed","tier_attempted":3,"reason":"anti_bot_block","metered":false} and your usage counter doesn't move.
Read the spec →
03 · The wedge

The relocator: a fingerprint of every selector, repaired in place on drift.

When a tier-1 selector fails, we don't return empty. We similarity-score the surrounding DOM against the last known fingerprint — tag, parent path, sibling structure, attribute shape, text neighborhood — and re-pin the field. The new selector is committed back to your sitepack. Next request: cheap and accurate.

incident.brandx_com.template_drift commit a4f9c21 → b8e012f · diff captured 2026-04-29 08:14 UTC
BEFORE commit a4f9c21 · stable for 142 days

The selector that worked.

<div class="product-detail">
  <span class="price js-price">$98.00</span>
  <p class="availability">In stock</p>
</div>
sitepack
.js-price
tier
1 · http
extraction
✓ price=98.00
elapsed
0.81s
AFTER commit b8e012f · template push, .js-price removed

The selector died. Triatomine didn't.

<div class="product-detail">
  <span data-pricing="current">$98.00</span>
  <p data-availability="in_stock">In stock</p>
</div>
relocator
sim 0.94 → repinned
new selector
[data-pricing="current"]
extraction
✓ price=98.00
sitepack
auto-committed
i.
Fingerprint at extraction time

Every successful tier-1 hit captures the matched element's tag, path-to-root, sibling shape, attribute keys, and a 4-token text neighborhood.

ii.
Score on drift

If next run misses, candidate elements are scored against the stored fingerprint. Above 0.85 we re-pin; below, we walk to JSON-LD or escalate the tier.

iii.
Heal the sitepack

The new selector and updated fingerprint are written to the per-host YAML. The next 1,000 requests for that host are tier-1 cheap again.

04 · Live benchmark

We publish what works and what doesn't. Most providers won't.

Run nightly against a fixed 10-host long-tail panel. We tell you exactly which Shopify-stack stores resolve cleanly on Tier 1+2, and which ones currently sit behind Cloudflare's edge and don't. The number we publish is the number you pay against.

7/10tier 1+2
2026-05-05

Long-tail DTC panel · 70% extraction rate. The 3 misses are concentrated on stores running Cloudflare in front of Shopify. We don't claim 99%. The ones who do are lying or testing on easy mode.

Panel
10 mid-sized DTC stores · Shopify / Shopify+ / BigCommerce
Method
scripts/tier3_spike.py · tiers 1,2 · no proxies · no solvers
Frequency
nightly · all results published, regressions flagged
Source
SUPPORTED_HOSTS.md →
Host Platform Outcome Latency
allbirds.com
Shopify✓ extracts product~1s
bombas.com
Shopify✓ extracts product~5s
casper.com
Shopify✓ extracts product~7s
hellotushy.com
Shopify✓ extracts product~7s
kettleandfire.com
Shopify Plus✓ extracts product~6s
outdoorvoices.com
Shopify✓ extracts product~1s
rothys.com
Shopify✓ extracts product~13s
brooklinen.com
Shopify + Cloudflare✗ anti_bot_block · hcaptcha
deathwishcoffee.com
Shopify + Cloudflare✗ anti_bot_block · hcaptcha
solostove.com
BigCommerce + CF✗ anti_bot_block · CF interstitial
05 · Test before you commit

What works. What doesn't. What we won't promise.

If your target sites are mostly column 1, you're our customer. If they're column 3, we're not — and we'd rather tell you now than after you've integrated.

Works · ship now

Long-tail DTC stores on vanilla Shopify, Woo, Magento, BigCommerce.

The 80% of e-commerce that server-renders Schema.org JSON-LD or microdata. Tier 1 catches it. Adaptive selectors keep it caught when the theme rolls.

allbirds.com · rothys.com bombas.com · casper.com kettleandfire.com + ~3,800 others on the panel
Sometimes · varies

Hydrated SPAs and JS-rendered catalogs.

Tier 2 (Playwright) handles most. Highly client-side stores that gate price behind sign-in or geo will sometimes need a sitepack. We'll write one with you on the Growth plan and up.

Headless Shopify storefronts Next.js commerce w/ ISR geo-gated PDPs
Out of scope · v1

Walmart-class retailers behind enterprise anti-bot.

Akamai BotManager, PerimeterX/HUMAN, layered defenses. Our last benchmark scored 0/10. Don't sign up for these. We're tracking residential proxies + per-provider solvers for v2.

walmart.com · target.com bestbuy.com · macys.com nordstrom.com · sephora.com
06 · Pricing

Base subscription, metered overage. Failures never billed.

Pick a plan with included requests. Overage is per-tier — you pay tier-1 rates for the static stuff and tier-3 only when the site actually fights you. Most teams pick Growth.

Free

Hacker
$0/mo
  • · 1,000 requests / mo
  • · tier 1 + 2 only
  • · 1 req/sec rate limit
  • · no card · no email gate to start
  • · community Discord
Start free

$29 / mo

Starter
$29/mo
  • · 25,000 requests / mo
  • · tier 1 + 2 + 3
  • · $0.002 / $0.008 / $0.020 overage
  • · 10 req/sec
  • · webhook delivery
Start Starter
Most teams pick this

$99 / mo

Growth
$99/mo
  • · 150,000 requests / mo
  • · $0.0015 / $0.006 / $0.015 overage
  • · batch endpoint · 100 URLs
  • · per-host sitepacks on request
  • · email support · 1-business-day SLA
Start Growth

$299 / mo

Scale
$299/mo
  • · 750,000 requests / mo
  • · $0.001 / $0.004 / $0.010 overage
  • · async + webhooks
  • · custom sitepacks bundled
  • · slack / wire-transfer billing
Start Scale
Tier overage ratestier-1 / tier-2 / tier-3 pricing in that order. Tier-2 ≈ 4× tier-1, tier-3 ≈ 10–20×.
Failure refundsAnti-bot block, no Schema.org Product, transport error → not metered. Visible on the usage page.
Annual billing20% off if you prepay annually on Starter+. Wire transfer accepted on Scale.
07 · Real questions

From real customer DMs.

If we don't answer your question here, hit reply on the welcome email. A human reads it.

Q.01

What about Walmart, Target, Best Buy?

Honest answer: we don't reliably scrape them on v1. Our public benchmark scored 0/10 against that population in our last run. Akamai BotManager and PerimeterX are not solved with browser-fingerprint rotation alone — they need residential-proxy egress and provider-specific solvers. That's v2 territory. Don't sign up for these today.

Q.02

How is this different from ScrapingBee, Apify, Zyte?

Three things. (1) Tier-1 is genuinely cheap — $0.001 on Scale vs $0.005 on most competitors. (2) Adaptive selectors that survive template drift; we cache element fingerprints and re-pin them on the new DOM. (3) Honest per-host reliability published. Most providers advertise 99%; we say 70% on the long tail and tell you which hosts. We don't have residential proxies or tier-4 yet.

Q.03

What counts as a "failure"?

Three cases: HTTP ≥500, anti-bot block at any tier, or no Schema.org Product extracted after the full cascade. Failures return a structured payload — {"status":"failed","tier_attempted":N,"reason":"...","metered":false} — and don't count against your usage.

Q.04

How long before my first 200?

Target is <90 seconds from signup. Email verification is required before your first scrape, not before signup, so you can read docs and copy the curl example without waiting on the inbox. Most alpha customers were posting requests within the first minute.

Q.05

Is this legal? What's your ToS posture?

We follow robots.txt by default and rate-limit per host. ToS forbids: scraping auth-walled content, copyrighted media at scale, government sites, and anything you don't have a legitimate interest in. We run a per-customer host blocklist that auto-extends on abuse signals. If a target site sends a complaint we kill the host within hours.

Q.06

Can I run the engine locally?

The Python + Node clients are MIT and on PyPI / npm. The core engine is closed-source — that's the moat. If you have a self-host requirement (compliance, air-gapped), email hill@triatomine.com and we'll talk about a Scale-tier on-prem deployment.

Built by people who got paged at 2 a.m.
by their own scraper. One time too many.

free tier · 1,000 req/mo · no card · live in 90 seconds
— hill & the colony, 2026