Exploring Early Strategies for Monitoring AI Overview and Language Model Visibility
As the landscape of artificial intelligence continues to evolve rapidly, understanding how large language models (LLMs) and AI overview platforms perform remains a significant challenge. Currently, there is a notable lack of comprehensive datasets that accurately capture performance metrics across these platforms. This scarcity makes it difficult to establish benchmarks or track growth effectively.
Despite these limitations, initial insights can be gleaned through indirect methods such as referral analytics pages, which offer some signals regarding content Engagement and user interest. One of the key questions in this area is:
What content are users actually consuming when they search for problem-solving queries within these AI models?
Over the past few days, I’ve been experimenting with various approaches to shed light on this question. Here are some preliminary observations:
- Unique Citation Patterns: Each LLM appears to have its own approach to sourcing and citing information. The sources and references they pull from aren’t always consistent across different platforms.
- Shared Domains Across Platforms: Approximately 40-50% of the time, the same domains appear across multiple AI platforms—including Google AI Overview, ChatGPT, and Perplexity—in their citation outputs. This overlap suggests certain authoritative sources dominate in these models’ training and referencing processes.
- Using Reverse Engineering for Visibility Insights: By leveraging tools like Semrush’s question-based Auto-Enhance Optimization (AEO) data and comparing it against real user queries, I’ve been able to start connecting the dots on how certain queries translate into visibility or rankings within these models.
While these methods are still rudimentary, I’ve begun to see some promising signs, such as emerging rankings appearing within ChatGPT and Perplexity. It’s clear, however, that our understanding of AI and LLM performance metrics is still in its nascent stages.
Looking Forward
Gauging the performance and visibility of AI models remains an evolving challenge, but innovative approaches—such as analyzing citation patterns and reverse-engineering query data—are beginning to provide valuable insights. As the field matures, I anticipate that more sophisticated tools and datasets will emerge to enable more accurate tracking.
I’m eager to hear from others in the community:
How are you currently tracking or assessing the visibility and performance of AI overviews and large language models?
Sharing practices and experiences can help accelerate our collective understanding in this emerging area.