A pay-as-you-go product-extraction API for the long tail of e-commerce. Adaptive selectors and per-host packs survive template drift — so your Tuesday-morning Slack stops filling up with "scraper broke again."
Scrapers are written against a CSS selector. The site ships a redesign. Your selector matches nothing. The provider returns a 200 with empty fields. Your dashboard fills with $0 prices and "out of stock" everywhere. Nobody pages you because the API didn't error — it just lied politely.
"We were paying ScrapingBee $499/mo. They claim 99% success. Then Brand X redesigned their PDP. Our pricing dashboard read $0.00 for nine days before anyone noticed. Nobody errored — every request was a 200."
Every request walks the cheapest tier first. Tier 1 is plain HTTP — fast, $0.001. We escalate to Tier 2 (headless) only if Tier 1 can't extract a Schema.org Product. Tier 3 (stealth) only if Tier 2 is blocked. You're billed for the tier that succeeded; failures are never metered.
{"status":"failed","tier_attempted":3,"reason":"anti_bot_block","metered":false} and your usage counter doesn't move.When a tier-1 selector fails, we don't return empty. We similarity-score the surrounding DOM against the last known fingerprint — tag, parent path, sibling structure, attribute shape, text neighborhood — and re-pin the field. The new selector is committed back to your sitepack. Next request: cheap and accurate.
Every successful tier-1 hit captures the matched element's tag, path-to-root, sibling shape, attribute keys, and a 4-token text neighborhood.
If next run misses, candidate elements are scored against the stored fingerprint. Above 0.85 we re-pin; below, we walk to JSON-LD or escalate the tier.
The new selector and updated fingerprint are written to the per-host YAML. The next 1,000 requests for that host are tier-1 cheap again.
Run nightly against a fixed 10-host long-tail panel. We tell you exactly which Shopify-stack stores resolve cleanly on Tier 1+2, and which ones currently sit behind Cloudflare's edge and don't. The number we publish is the number you pay against.
Long-tail DTC panel · 70% extraction rate. The 3 misses are concentrated on stores running Cloudflare in front of Shopify. We don't claim 99%. The ones who do are lying or testing on easy mode.
| Host | Platform | Outcome | Latency |
|---|---|---|---|
allbirds.com | Shopify | ✓ extracts product | ~1s |
bombas.com | Shopify | ✓ extracts product | ~5s |
casper.com | Shopify | ✓ extracts product | ~7s |
hellotushy.com | Shopify | ✓ extracts product | ~7s |
kettleandfire.com | Shopify Plus | ✓ extracts product | ~6s |
outdoorvoices.com | Shopify | ✓ extracts product | ~1s |
rothys.com | Shopify | ✓ extracts product | ~13s |
brooklinen.com | Shopify + Cloudflare | ✗ anti_bot_block · hcaptcha | — |
deathwishcoffee.com | Shopify + Cloudflare | ✗ anti_bot_block · hcaptcha | — |
solostove.com | BigCommerce + CF | ✗ anti_bot_block · CF interstitial | — |
If your target sites are mostly column 1, you're our customer. If they're column 3, we're not — and we'd rather tell you now than after you've integrated.
The 80% of e-commerce that server-renders Schema.org JSON-LD or microdata. Tier 1 catches it. Adaptive selectors keep it caught when the theme rolls.
Tier 2 (Playwright) handles most. Highly client-side stores that gate price behind sign-in or geo will sometimes need a sitepack. We'll write one with you on the Growth plan and up.
Akamai BotManager, PerimeterX/HUMAN, layered defenses. Our last benchmark scored 0/10. Don't sign up for these. We're tracking residential proxies + per-provider solvers for v2.
Pick a plan with included requests. Overage is per-tier — you pay tier-1 rates for the static stuff and tier-3 only when the site actually fights you. Most teams pick Growth.
If we don't answer your question here, hit reply on the welcome email. A human reads it.
Honest answer: we don't reliably scrape them on v1. Our public benchmark scored 0/10 against that population in our last run. Akamai BotManager and PerimeterX are not solved with browser-fingerprint rotation alone — they need residential-proxy egress and provider-specific solvers. That's v2 territory. Don't sign up for these today.
Three things. (1) Tier-1 is genuinely cheap — $0.001 on Scale vs $0.005 on most competitors. (2) Adaptive selectors that survive template drift; we cache element fingerprints and re-pin them on the new DOM. (3) Honest per-host reliability published. Most providers advertise 99%; we say 70% on the long tail and tell you which hosts. We don't have residential proxies or tier-4 yet.
Three cases: HTTP ≥500, anti-bot block at any tier, or no Schema.org Product extracted after the full cascade. Failures return a structured payload — {"status":"failed","tier_attempted":N,"reason":"...","metered":false} — and don't count against your usage.
Target is <90 seconds from signup. Email verification is required before your first scrape, not before signup, so you can read docs and copy the curl example without waiting on the inbox. Most alpha customers were posting requests within the first minute.
We follow robots.txt by default and rate-limit per host. ToS forbids: scraping auth-walled content, copyrighted media at scale, government sites, and anything you don't have a legitimate interest in. We run a per-customer host blocklist that auto-extends on abuse signals. If a target site sends a complaint we kill the host within hours.
The Python + Node clients are MIT and on PyPI / npm. The core engine is closed-source — that's the moat. If you have a self-host requirement (compliance, air-gapped), email hill@triatomine.com and we'll talk about a Scale-tier on-prem deployment.