Spam Traffic Analysis

Window: 23 Mar – 28 Jun 2026 (full weeks)  ·  Source: GA4 → BigQuery export  ·  Spam label: page-views in Direct sessions with ≥50 page-views or ≥600 events
TL;DR — for the dev team

Bad-bot traffic collapses onto one request-time signature — Windows 10 + Chrome 146, from datacenter IPs (Portland OR, LA), scraping the /agency/* directory — which alone is 52% of all spam page-views at ~88% precision. Every table below carries a spam-% of segment column so you can challenge the high-precision fingerprints (reCAPTCHA / WAF) without country-blocking real US/India users or touching the ~87%-legitimate AI Assistant channel.

1. User-agent fingerprint

Spam is a page-view phenomenon: 0.30% of sessions (≈4,000) produce 24.5% of all page-views, because each bot session loads dozens to hundreds of pages. The question the dev team asked is which requests — and the user-agent answers it almost by itself.

Windows 10 · Chrome 146 → 52% of all spam, 88% of that segment is spam
One UA string. Windows 10 across all browser versions carries 76% of spam; Windows 11, by contrast, is a normal ~11%.
No Results
  • The bots pin an outdated / mismatched UA: Windows 10 with Chrome 146, and older builds (Chrome 116, 120, 124) that no real user base still runs at volume. Where the string is stale, precision is highest — Chrome 124 (83%), Chrome 116 on Linux (75%), Chrome 150 (83%).
  • Read the last column as false-positive risk. At 88% spam, blocking Windows 10 · Chrome 146 outright would catch ~12% real users — too high for a hard block, ideal for a reCAPTCHA challenge (humans pass, headless bots don't). The sub-segments above 95% are safe to deny outright.
Loading...

Windows 10 alone accounts for 76% of all spam and 47% of its own page-views are spam — the cleanest coarse filter if a per-version rule is too granular to maintain.

Screen resolution

Screen resolution isn't in the GA4 → BigQuery export, so this one table is pulled from the GA4 Data API (screenResolution dimension). The API can't express our "≥50 page-views per session" threshold, but it returns page-views and sessions per resolution — so page-views per session stands in as the bot-flood density (a real user runs ~1–3; the flood bot runs 100+):

No Results
  • 1600x900 is the resolution fingerprint: 304k page-views across just 2,591 sessions — 117 per session. Every ordinary resolution (1920x1080, 3840x2160) sits at ~1. High volume alone is a red herring — 800x600 has 217k page-views but only ~1.2 per session, so it isn't the flood bot.
  • That 1600x900 total (303,965 page-views, and 303,626 of them on Chrome) matches the Windows 10 · Chrome 146 signature's 304,083 spam page-views almost exactly — it's the same bot, now pinned on a fourth independent axis: Windows 10 + Chrome 146 + 1600x900 + datacenter IP, scraping /agency/*.
  • A couple of oddball high-density resolutions (1366x900, 2732x1536) run 30–40 page-views per session on tiny session counts — smaller bot variants worth a challenge too.

2. Origin

The original brief suspected Chinese traffic. The data doesn't support that (Chinese-language traffic is 1.2% spam; China isn't a top origin). More importantly, the country view is a trap: the US (56%) and India (15%) lead only because they're our biggest real markets and where cloud regions sit. Drop to city and the datacenter mask falls off:

No Results
  • Portland OR (98% spam) and Los Angeles (89%) are 43% of all spam. These are cloud/hosting locations (Portland OR is the AWS us-west-2 region), not consumer populations — a session count in the low hundreds throwing 250k page-views.
  • Council Bluffs, Iowa (75%) is a Google Cloud region; Singapore, Milpitas, Strasbourg, Midrand and Czechia all run 77–98% off tiny real bases. Anything above ~95% here is a safe IP-range / ASN block; the 75–90% band is a challenge target.
  • This is why a country block is the wrong tool: it would hit millions of real US and Indian users to catch bots that are actually confined to a handful of hosting cities.

3. Target

The bots don't behave like lost visitors. They skip the homepage (/ is only 6% spam) and go straight at the product — the agency listings and profiles:

No Results
  • Category listings — /agency/email-marketing (94%), /agency/digital-marketing/us (84%), /agency/software-development (76%) — are near-pure spam.
  • The tell-tale for scraping: single sessions that deep-crawl one /agency/profile/<name> page for thousands of page-views at 100% spam (e.g. one session, ~7,700 views of a single profile). Someone is systematically harvesting the agency database.
  • Actionable because it's a request-time signal too: the /agency/* tree can carry a stricter challenge/rate-limit than the rest of the site, where real users and legitimate AI crawlers spend their time.

4. Scale and timing

By page-view share, spam runs ~10–17% at baseline with two clear floods on top:

Loading...
  • Late Mar – early Apr: spam jumps to ~47–48% of the week.
  • June: a sustained 22–28% band.

The spikes don't track real demand — total page-views rise only because the spam layer is added on top. The same UA/city/URL fingerprint dominates both incidents, so one rule set covers the recurring pattern.

5. What it distorts

Because spam is so few sessions, it barely moves the session engagement rate — strip it out and the weekly rate is essentially unchanged:

Loading...

But every page-view-weighted metric is hit. Engagement time per page-view — a fair proxy for the per-page quality signals AI search weighs — dips exactly in the spike weeks, because bots add page-views with almost no engagement behind them:

Loading...

So "spam plummets our engagement metrics" is true for page-view-weighted metrics, not session-rate metrics — and the reported page-view counts themselves are inflated up to ~48% in an incident.

Recommendations

The signatures overlap (it's largely the same bots), so a small rule set covers most of the load. Each table gives spam-% of segment as the false-positive gauge — block the near-pure segments, challenge the rest.

  1. Challenge the UA fingerprint. Serve a reCAPTCHA / JS challenge to Windows 10 + Chrome 146 and the stale-Chrome family (116/120/124/150). ~52% of all spam sits in the first string alone, at 88% precision — challenge rather than hard-block so the ~12% real users pass. A 1600×900 screen resolution is a corroborating client-side signal for the same bot (117 page-views/session vs ~1 for real users).
  2. Block the datacenter origins by IP/ASN. The >95% cities (Portland OR, Milpitas, Strasbourg, Midrand, Czechia) are safe to deny; challenge the 75–90% band (LA, Singapore, Council Bluffs). This captures the US "volume" without a country ban.
  3. Harden the /agency/* tree. Rate-limit and challenge agency listings and profile pages more strictly than the rest of the site; that's what the bots are scraping.
  4. Don't country-block, and spare the AI Assistant channel. US/India are our real markets; the AI Assistant channel is ~87% legitimate and strategically important — exclude it from any rule.
  5. Report human-only for page-view metrics. Session engagement rate is already robust; page-view counts and engagement-per-page-view should be reported spam-excluded so incidents don't read as demand swings.

Notes & caveats

  • Spam label (from the task): page-views in Direct sessions (event-param medium in the direct set) with ≥50 page-views OR ≥600 events. The fingerprints above are measured on that population; the dev team applies them at request time, before a session accumulates page-views.
  • Screen resolution isn't in the GA4 → BigQuery export, so §1's resolution table comes from the GA4 Data API (screenResolution). The Data API is pre-aggregated and can't apply our per-session spam threshold, so that table uses page-views-per-session as the bot-density proxy rather than the exact ≥50-page-view rule; its counts may differ slightly from the BigQuery figures.
  • Device model is empty because spam is ~98% desktop; language is non-discriminating (spam is en-us, like real traffic).
  • Geo is IP-based and masked for datacenter/VPN bots; cloud attributions above are inferred from city + volume, not confirmed ASNs.
  • Window: 23 Mar – 28 Jun 2026, full weeks — the export doesn't reach earlier, so pre-March incidents (including older China spikes) are out of view.
  • Rankings are out of scope for GA4; correlating bot hits with ranking dips needs Search Console (a natural next step).

Last refreshed 18 seconds ago