Site Query Audits: Find Indexing Gaps with Google Search Results

A site: query is a blunt tool. That is exactly why I like it.

It does not replace Google Search Console. It does not prove the exact number of indexed pages. It will not tell you why a URL is missing. But it is fast, visible, and good at catching embarrassing problems: a new docs section that does not appear, a pricing page with an old title, a staging URL that slipped into search, or a set of localized pages that Google is not surfacing at all.

Used as a scheduled audit, site: queries become a cheap smoke test for search visibility.

What a site query audit can catch

Start with practical checks, not abstract index counts.

A good audit can answer questions like:

Do our most important pages appear for branded site: searches?
Did the new blog category get indexed after launch?
Are old staging or preview URLs visible?
Are localized pages showing the right language in titles and snippets?
Did a migration leave duplicate paths in Google?
Are support docs findable by their product terms?

That is enough to justify a small daily or weekly job.

Query patterns worth saving

The useful queries are usually specific. Broad site:example.com searches are noisy and the result count is not reliable enough to treat as a metric.

Better patterns:

site:example.com/docs "API key"
site:example.com/blog "rank tracking"
site:example.com/pricing
site:example.com inurl:staging
site:example.com "404"
site:example.com/en/ "pricing"
site:example.com/fr/ "tarifs"

The point is not to scrape every indexed URL. The point is to check whether Google is showing the pages and templates that matter.

For a SaaS product, I would usually start with:

homepage and pricing page
docs home and two important docs articles
newest five blog posts
top product landing pages
localized variants, if they exist
negative checks for staging, preview, test, and old domains

That list gives you a focused audit instead of another dashboard nobody reads.

Run the checks through a Google Search API

Manual checks are fine once. They are bad as a habit. If you want repeatable evidence, call the same query on a schedule and store the result.

With SerpBase, a site query is just a normal Google search request:

import requests

API_KEY = "your_api_key"
SEARCH_URL = "https://api.serpbase.dev/google/search"


def google_site_query(query: str, gl: str = "us", hl: str = "en") -> dict:
    response = requests.post(
        SEARCH_URL,
        headers={
            "X-API-Key": API_KEY,
            "Content-Type": "application/json",
        },
        json={"q": query, "gl": gl, "hl": hl, "page": 1},
        timeout=30,
    )
    response.raise_for_status()
    data = response.json()
    if data.get("status") != 0:
        raise RuntimeError(data.get("error") or "search failed")
    return data

The response gives you structured organic results, so you can check titles, links, and snippets without parsing HTML.

Turn search results into audit assertions

A useful audit needs expectations. Otherwise it is just a pile of URLs.

For each check, define:

the query
the URL pattern you expect to see
words that should appear in the title or snippet
words or hosts that must not appear
severity if the check fails

A small config file can do the job:

[
  {
    "name": "pricing page visible",
    "query": "site:serpbase.dev/pricing",
    "expected_url_contains": "/pricing",
    "expected_text": ["pricing", "credits"],
    "severity": "high"
  },
  {
    "name": "no staging pages",
    "query": "site:serpbase.dev inurl:staging",
    "must_be_empty": true,
    "severity": "critical"
  }
]

Then evaluate the top results.

def result_text(result: dict) -> str:
    return " ".join([
        str(result.get("title") or ""),
        str(result.get("snippet") or ""),
        str(result.get("link") or ""),
    ]).lower()


def evaluate_check(check: dict, data: dict) -> dict:
    organic = data.get("organic", [])

    if check.get("must_be_empty"):
        return {
            "ok": len(organic) == 0,
            "reason": "unexpected indexed results" if organic else "clean",
            "matches": organic[:3],
        }

    url_part = str(check.get("expected_url_contains") or "").lower()
    expected_text = [str(x).lower() for x in check.get("expected_text", [])]

    for result in organic[:10]:
        link = str(result.get("link") or "").lower()
        text = result_text(result)
        if url_part and url_part not in link:
            continue
        if expected_text and not all(term in text for term in expected_text):
            continue
        return {"ok": True, "reason": "matched", "match": result}

    return {"ok": False, "reason": "expected page not found", "matches": organic[:3]}

This gives you a report a human can act on. It does not just say that something changed. It says which expected page was missing or which forbidden pattern appeared.

Treat result counts with suspicion

Do not build alerts around Google's displayed result count for site: queries. It can swing around and it is often approximate.

Use concrete checks instead:

expected URL appears in the top 10
forbidden URL does not appear
title contains the current product name
snippet does not include old pricing or deprecated language
localized page appears for the matching language query

Those checks are less glamorous, but they catch real issues.

Compare snapshots after releases

The best time to run a site query audit is after something changes.

Run it after:

a CMS migration
a docs IA change
a new language launch
a slug cleanup
a pricing page rewrite
a noindex or canonical template change
a large batch of blog posts

Store the before and after snapshots. If the expected URLs disappear, you have evidence. If old URLs are still visible, you know what to redirect or remove. If snippets still show outdated copy, you know Google has not refreshed that page yet.

A minimal snapshot table is enough:

CREATE TABLE site_query_audits (
  id BIGSERIAL PRIMARY KEY,
  check_name TEXT NOT NULL,
  query TEXT NOT NULL,
  gl TEXT NOT NULL,
  hl TEXT NOT NULL,
  ok BOOLEAN NOT NULL,
  reason TEXT NOT NULL,
  top_results_json JSONB NOT NULL,
  collected_at TIMESTAMPTZ NOT NULL DEFAULT now()
);

Keep the recent raw results so someone can inspect what Google actually returned.

Add negative checks for accidental exposure

Positive checks find missing pages. Negative checks find pages that should not be there.

Examples:

site:example.com inurl:staging
site:example.com inurl:preview
site:example.com "internal only"
site:example.com "do not publish"
site:example.com "localhost"
site:example.com "test user"

These are uncomfortable searches, which is why they are useful. A weekly negative audit can catch mistakes that normal rank tracking will never report.

Use Search Console for diagnosis, SERPs for visibility

Search Console is still the better tool for canonical indexing diagnostics. Use it when you need to know crawl status, coverage, canonical selection, or page-level indexing reasons.

A site query audit answers a different question: what can a searcher actually see in Google right now?

That is why the two tools work well together. The API audit catches the visible symptom. Search Console helps diagnose the cause.

Keep the report small

The failure mode for SEO monitoring is too much noise. Do not alert on every changed title or every reordered result.

Alert on checks that have an owner:

pricing page missing: marketing or growth
docs page missing: docs or developer relations
staging URL indexed: engineering
old domain still ranking: SEO or platform
wrong language snippet: localization

If nobody knows what to do with the alert, remove it or downgrade it.

Where SerpBase fits

A site query audit does not need a full enterprise SEO suite. It needs repeatable Google results, country/language targeting, and JSON you can store.

SerpBase is useful for this because each check is a normal search request. You can run a small audit after deploys, schedule broader checks weekly, and keep the output in your own database. No browser automation, no selector maintenance, no manual screenshots.

Start with ten checks. Include five pages that must be visible and five patterns that must stay invisible. After the first real catch, the audit will earn its place in the release checklist.