Blog

LLM Visibility Audit: How to Check If ChatGPT, Claude & Perplexity Cite Your Brand

23 April 2026 · Eugene

You ask ChatGPT for the best providers in your category. Three competitors you know are weaker than you get named. You don’t. That gap is measurable, and it’s auditable.

Most brands discover their LLM visibility problem by accident — a client mentions ChatGPT recommended a rival, or a prospect says Perplexity “didn’t have much on you.” By then the gap has usually existed for months. A structured audit closes that blind spot. You don’t need specialist tooling to start. You need a reproducible method, a scoring rubric, and the discipline to run it quarterly.

This post walks through the audit we run with consultancy clients before scoping any generative engine optimisation work. It’s deliberately manual. The point isn’t to automate measurement on day one — it’s to build intuition about how LLMs actually represent your brand, where the gaps are, and which fixes matter most.

Why Manual Audit Still Beats Buying Tooling

Specialist LLM visibility tools exist and they’re improving fast. Profound, Otterly, Peec, Athena, AlsoAsked’s AI features, and a growing cluster of platforms all monitor brand mentions across ChatGPT, Perplexity, Gemini, and Claude at scale. They’re useful — especially for enterprise teams tracking dozens of brands, topics, and markets.

But starting with a tool is the wrong move for most Singapore brands. Here’s why.

Running the audit manually forces you to read what the LLM actually says — the hedges, the caveats, the outdated facts, the confident inaccuracies. You notice that ChatGPT keeps calling you a “digital marketing agency” when you’re a specialist consultancy. You see Perplexity cites a three-year-old Reddit thread as its primary source. You spot that Gemini knows your competitor’s founder by name but can’t name yours. Tools surface frequency; manual audit surfaces why.

Cost is the second reason. Specialist platforms start around SGD 400-2,000/month. For a brand that’s never audited before, spending that before you understand your baseline is backwards. Run manual audits for a quarter, establish what matters, then decide if continuous monitoring justifies the spend. This mirrors how we recommend clients approach answer engine optimisation — measure by hand first, automate once the signal is clear.

The third reason is strategic. A spreadsheet of mentions doesn’t tell you what to do. Reading a bad ChatGPT answer tells you exactly what to fix — a missing Wikipedia entry, a thin About page, a competitor’s podcast appearance that’s anchoring the model’s view of the category.

The Five Dimensions Every LLM Visibility Audit Covers

A thorough audit tests five distinct query types. Each probes a different layer of how LLMs see your brand. Skip any one and you’ll misread your position.

Brand Recognition Queries

Start with direct questions about your brand. This tests whether the model has accurate, substantive information about you at all.

“Tell me about [Your Brand].”
“What does [Your Brand] do?”
“Who founded [Your Brand]?”
“Where is [Your Brand] based?”
“What is [Your Brand] known for?”

Run each query on ChatGPT (both with and without web browsing), Claude, Gemini, Perplexity, and Copilot. Note whether the answer is accurate, partially accurate, confidently wrong, or absent. “Confidently wrong” is the worst outcome — the model will happily tell prospects things about you that aren’t true.

Category Recognition Queries

Next, test whether you surface when someone asks about your category without naming you.

“What are the best SEO consultants in Singapore?”
“Who are the top B2B SaaS marketing agencies in APAC?”
“Which firms specialise in medical SEO in Southeast Asia?”
“Best [your category] in [your geography].”

You’re checking two things: are you mentioned at all, and where in the list. A brand mentioned eighth gets different downstream treatment than one mentioned first. This dimension most closely mirrors traditional search — and the hardest to move without strategic content and citation work.

Comparison Queries

When someone is evaluating options, they often ask comparative questions. These reveal whether LLMs have enough signal to slot you into competitive sets.

“[Competitor A] vs [Competitor B] — which is better for [use case]?”
“Alternatives to [Competitor].”
“Who competes with [Competitor] in Singapore?”

If you compete directly with a known brand and don’t appear in comparison answers, that’s a concrete visibility gap. LLMs build competitive sets from cited articles, directory listings, Reddit threads, and review roundups — if you’re absent from those, you’re absent here.

Recommendation Queries

Shift from “who exists” to “who should I use.” These are bottom-of-funnel.

“What SEO consultancy should I hire for a Singapore SaaS company?”
“Which digital PR agency is best for a medical clinic in Singapore?”
“Recommend a consultant for international SEO expansion from Singapore.”

Recommendation queries are where LLMs get opinionated. The model will name two to five specific brands. Being named here correlates strongly with downstream pipeline — it’s the modern equivalent of a podcast host recommending you by name.

Problem-First Queries

Finally, test problem framings where the user never mentions a category. This is the most authentic way humans actually use LLMs.

“My Shopify store’s organic traffic dropped 40% after Google’s last update. Who can help?”
“I need someone to audit our site before we launch in Indonesia.”
“Our medical clinic in Singapore is invisible on Google. What do I do?”

Problem-first queries reveal whether the model connects your brand to the problems you actually solve. It’s the hardest dimension to move, because it requires content that explicitly addresses symptoms and scenarios — not just service descriptions.

Which Platforms to Test and Why Each Matters

Not all LLMs retrieve information the same way. A complete audit spans at least five surfaces, and you need to understand which use real-time retrieval and which rely primarily on training data.

ChatGPT (GPT-4o / GPT-5, without browsing): Pure training-data answers. Reveals what the model “knows” about you from pre-training, which is often 12-18 months stale.
ChatGPT (with browsing / search enabled): Retrieves live web results, similar to Bing. Reflects current content and citation landscape.
Claude (with and without web search): Anthropic’s retrieval behaviour differs from OpenAI’s. Worth testing both modes because answers often diverge.
Gemini: Heavy integration with Google Search. Overlaps most with traditional SEO visibility and AI Overviews.
Perplexity: Live retrieval with transparent source citations. Often the clearest window into which domains LLMs trust for your category.
Microsoft Copilot: Bing-backed, surfaces in Windows, Edge, and Microsoft 365 contexts. Enterprise buyers use it more than you’d expect.

Run each query on each platform. Yes, it’s tedious. A full audit of a single brand across five query categories and five platforms — with two to three queries per category — produces 50-75 data points. That’s the point. Sampling lightly gives you misleading conclusions.

A Simple 0-3 Scoring Rubric

Consistency matters more than sophistication. We use a four-level scale that any team member can apply reliably.

Score	Meaning	Example
0	Not mentioned	Brand absent from the answer entirely.
1	Mentioned inaccurately	Named but described wrongly (wrong category, wrong geography, wrong founder, hallucinated facts).
2	Mentioned accurately but thinly	Named correctly, but without detail, context, or endorsement. One-line mention.
3	Cited or recommended substantively	Featured prominently, described accurately, with context that positions you favourably.

Apply the score per query per platform. A brand might score 2 on ChatGPT-with-browsing for a category query, 0 on Claude, 1 on Perplexity. Average scores across platforms for each dimension, and you get a visibility heatmap that shows exactly where to invest.

Aggregate differently depending on what you’re measuring. For overall presence, average across all 50-75 data points. For platform-specific strategy, average per platform. For content-gap prioritisation, average per dimension.

Query Template Library You Can Adapt Today

Below is a starting library. Adapt the bracketed variables to your brand, category, and geography. Aim for 15-20 queries total in your first audit — more becomes unwieldy, fewer gives you too little signal.

Brand queries:
1. “Tell me about [Brand].”
2. “What does [Brand] do and who runs it?”
3. “Is [Brand] credible / well-regarded?”
4. “Where is [Brand] based and which markets do they serve?”

Category queries:
5. “Best [category] in [geography].”
6. “Top [category] for [specific segment — e.g., Series A SaaS / multi-clinic healthcare].”
7. “Which [category] providers specialise in [niche]?”
8. “[Category] firms with experience in [vertical].”

Comparison queries:
9. “[Competitor 1] vs [Competitor 2].”
10. “Alternatives to [dominant competitor].”
11. “How does [Brand] compare to [Competitor]?”
12. “Who competes with [Competitor] in [geography]?”

Recommendation queries:
13. “What [category] should I hire for [specific scenario]?”
14. “Recommend a [role] for a [company type] in [geography].”
15. “Who should I talk to about [problem] in [geography]?”

Problem-first queries:
16. “My [situation / pain point]. Who can help?”
17. “[Symptom]. What kind of consultant do I need?”
18. “I’m launching [product] in [market]. Who should I work with?”

Record each prompt verbatim, the date, the platform, the model version if visible, and the full response. Screenshots help — LLM answers change, and you’ll want the audit trail in six months.

What to Do With the Findings

The audit produces a scored matrix. Now comes the harder part — prioritising remediation. Four patterns cover most situations.

Pattern 1: Accurate but thin mentions. You score 1-2 consistently across brand queries. The model knows you exist but lacks depth. Fix with authoritative entity work — Wikipedia (where eligible), structured data, a thorough About page, clear founder bios, consistent NAP across directories. This is standard answer engine optimisation territory.

Pattern 2: Absent from category queries. You score 0 on “best in category” queries. The model has never encountered you in contexts that signal category membership. Fix with citation-building in places LLMs trust: industry roundups, comparison posts, “best of” lists, podcast appearances, expert quotes in trade publications. Our digital PR services exist specifically to produce these signals.

Pattern 3: Confident inaccuracies. You score 1 repeatedly — the model gets things wrong about you. Dangerous because prospects trust it. Fix by flooding the authoritative zone with correct facts: own-site content, structured data, press releases that ranked, controlled third-party profiles. Re-audit in 60-90 days; retraining cycles and retrieval updates take time to propagate.

Pattern 4: Competitor gaps. Competitors appear where you don’t across comparison and recommendation queries. Analyse the sources LLMs cite when naming them — Perplexity’s visible citations are gold for this. Build parity or superiority in those source types. Often this means earned media placements, not just content publishing.

Work top-down: fix brand recognition first (it’s foundational), then category, then comparison and recommendation. Problem-first queries typically improve as the other four dimensions strengthen. Senior strategic work like this is what SEO consultancy engagements are designed to scope and prioritise.

How Often to Re-Run the Audit

Quarterly is the minimum cadence. Models update, retrieval layers change, and the citation landscape shifts weekly. A once-a-year audit will miss regressions and over-credit fixes that didn’t actually hold.

Monthly is realistic for brands with active GEO programmes — the signal-to-noise improves enough to justify the effort. Weekly is overkill unless you’re running experiments or recovering from a specific visibility incident.

Keep prior audits. A spreadsheet showing Q1, Q2, Q3 scores by dimension and platform tells you far more than any single snapshot. Trends matter. A brand moving from 0.8 average to 1.9 average over two quarters is doing something right, even if absolute scores still look low.

FAQ — LLM Visibility Audits

How often should I run an LLM visibility audit?
Quarterly at minimum, monthly if you’re running an active GEO programme. Models update and the citation landscape shifts faster than traditional SEO, so yearly audits miss too much. Keep historical audits so you can track trends rather than just snapshots.

Do I need specialist tools like Profound or Otterly to audit?
Not to start. Manual audits build the intuition that tools can’t. Once you’ve run two or three quarterly audits by hand and know what matters for your category, specialist monitoring tools become useful — especially if you’re tracking many brands or topics. For most Singapore brands, spend the first quarter auditing manually before committing to a platform.

What if ChatGPT or Claude cites inaccurate information about my brand?
Flood the authoritative sources with correct information — own-site content, structured data, press releases, controlled third-party profiles, Wikipedia where eligible. Re-audit in 60-90 days. Retrieval caches and retraining cycles mean corrections propagate slowly, and some inaccuracies persist until the underlying citations are replaced or outranked.

How do I fix LLM visibility gaps in my category?
Focus on citation-building where LLMs already trust the sources. Industry roundups, “best of” lists, podcast appearances, expert quotes in trade publications, and high-authority comparison content all feed the training and retrieval layers. Our generative engine optimisation work is structured around building these signals systematically.

Can I audit competitors as well as my own brand?
Yes, and you should. Run the same five dimensions with competitor brand names substituted. You’ll see which competitors dominate which dimensions, which sources LLMs cite for them, and where your realistic opportunities sit. Competitor audits often reveal the fastest wins — if a competitor is cited from a specific directory or publication you’re not in, that’s a clear next step.

How long does a full manual audit take?
Plan for 3-5 hours for a single brand across five platforms and 15-20 queries. Add time for scoring, pattern analysis, and remediation prioritisation — a complete audit write-up runs 6-10 hours. Faster second time around once templates and scoring habits are established.

What’s the difference between an LLM visibility audit and AEO?
The audit measures your current state across LLM platforms. AEO is the discipline of optimising for answer engines — Google’s AI Overviews, featured snippets, and LLM answers — once you know where the gaps are. Audit first, optimise second. One produces the diagnosis; the other delivers the treatment.

Does LLM visibility affect actual pipeline?
Increasingly, yes. B2B buyers consult LLMs during research phases far more than they admit. We’ve had Singapore clients close deals where the prospect explicitly said ChatGPT recommended them, and lost deals where Perplexity surfaced a competitor’s case study and not theirs. The pipeline impact is hard to attribute cleanly, but it’s real and growing.

Discuss Your LLM Visibility Strategy

If you’ve run the audit informally and seen gaps — or want a structured first-pass audit scoped and delivered — get in touch. We run LLM visibility audits as standalone projects and as the starting point for ongoing GEO engagements.

Book a free 30-minute consultation or email [email protected].