RankLens Entities: AI Brand Visibility (Open-Source)

Jim Liu

15 hours ago

Table of Contents

Measuring how LLMs recommend brands & sites — with open data, code, and a reproducible method

Summary: We ran 15,600 samples across 52 categories/locales to study how LLMs (ChatGPT, Claude, Gemini, etc.) mention or rank brands/sites. We open-sourced the method (Entity-Conditioned Probing + Resampling), data, and code so teams can audit, reproduce, and extend.

Links:

Preprint (Zenodo): https://zenodo.org/records/17489350
Code (GitHub): https://github.com/jim-seovendor/entity-probe
Data (Hugging Face): https://huggingface.co/datasets/seovendorco/entity-probe
Contact: ask@seovendor.co

Why this matters

LLMs increasingly act like recommenders. For queries like “best [tool]” or “top [service]”, users often see a small set of brands/sites. If you’re building AI products or care about brand visibility, you need answers to:

Which brands/sites appear most often?
How stable are those results across samples, locales, and models?
How reliable is a “top-k” list derived from an LLM?

Our goal: provide a transparent, reproducible way to measure and discuss LLM “recommendations.” – Jim Liu, CEO, SEO Vendor

What is RankLens Entities?

A public, reproducible suite for measuring LLM brand/site visibility:

Method: Entity-Conditioned Probing (ECP) with multi-sampling + half-split consensus to estimate reliability (overlap@k).
Data: 15,600 samples over 52 categories/locales, with parsed entities and metadata.
Code: Scripts to aggregate lists, compute consensus top-k, and evaluate reliability.

ECP sampling + half-split consensus diagram

Key findings (high level)

Reliability varies by category & locale. Some categories show high overlap@k; others are volatile.
Alias risk is real. The same brand can appear under multiple strings (e.g., “Acme CRM”, “acme.com”). Normalization helps but isn’t perfect.
Prompt sensitivity exists. Small phrasing changes can shift inclusion/ordering—use multi-sampling and track prompts.
Model drift happens. Time-stamp runs and avoid comparing apples to oranges across model updates.

Who it’s for

AI/ML engineers: a reproducible eval harness for LLM list-style answers.
SEO/marketing teams: visibility tracking for brands/sites in LLM responses.
Researchers: open artifacts + method to extend, critique, or benchmark.

What’s included

Preprint: full method + limitations and guidance.
Repository: parsing, consensus, reliability metrics, export utilities.
Dataset: per-prompt lists, JSONL results, schema docs.
Examples: ready-to-run code for overlap@k and consensus top-k.

How it works (quick overview)

Probe each (category, locale, model) with standardized prompts.
Parse responses into entity lists (brands/sites).
Resample & split: divide lists into two halves.
Consensus: compute top-k for each half via frequency aggregation.
Reliability: measure overlap@k between the halves.

If overlap@k is high → the “top-k” is likely stable for that setup. Low → treat any single top-k as noisy.

Limitations (read before using)

Not a ranking of “truth,” but a measurement of what LLMs surface under given conditions.
Aliases & normalization: we include baseline rules; entity resolution is imperfect.
Prompt & temperature sensitivity: track your configurations; use multi-sampling.
Locale effects: some markets/categories are intrinsically turbulent.
Model updates: repeat runs over time to monitor drift.

Get started

Skim the preprint for method + caveats.
Clone the repo and run the overlap@k example.
Load the dataset (CSV/JSONL) and reproduce a chart.
Add your category/locale and compare reliability.
Open a PR with improvements (normalizers, metrics, charts).

FAQ

Is this a product?
No—the study and artifacts are open for research and transparency; they inform products but aren’t a product themselves.

Can I cite this?
Yes. Cite the Zenodo preprint; include the GitHub repo and dataset link.

What’s “alias risk”?
Multiple strings referring to the same brand/site (e.g., name variants, URL vs. brand). We apply normalization; you can extend it.

Can I use this commercially?
Check the licenses in the repo/dataset. Contributions welcome.