The Google Search Data Leak and Its Implications for SEO

A couple of days ago, a Google Search data leak surprised the Search Engine Optimisation (SEO) community.

Initially shared by an anonymous source (later identified as Erfan Azimi) and subsequently analysed by SEO experts Rand Fishkin and Mike King, the leak comprises over 2,500 documents detailing more than 14,000 attributes apparently used in Google’s ranking systems. Google has confirmed the authenticity of the documents. This leak has exposed internal details about Google’s ranking mechanisms, sparking widespread debate about its implications for SEO.

What Is an API?

An API, or Application Programming Interface, allows systems to communicate by enabling data queries and retrieval. For example, the Google Maps API lets developers integrate interactive maps into websites by pulling location data directly from Google’s system.

Understanding the Google Search API

The Google Search API provides programmatic access to Google’s search results, enabling developers to integrate search functionalities into applications or websites. While tools like Google Analytics API and Google Search Console API are commonly used for rank tracking, the Search API is primarily leveraged for custom search solutions.

Key Findings and Contradictions

The leaked documents reveal internal terminology and potential ranking factors, some of which appear to contradict Google’s long-held public statements:

  • User Engagement Signals: Despite Google often downplaying direct click metrics, the documents reference systems like “Navboost” and terms such as “goodClicks,” “badClicks,” and “lastLongestClicks,” strongly suggesting user interaction data plays a role.
  • Chrome Browser Data: Leaked materials seem to indicate that data from the Chrome browser (like page views) might be utilized in rankings, contrary to Google’s public denials.
  • “Sandbox” Effect: Evidence in the documents appears to support the long-theorised concept of a “sandbox” period where new websites or pages might face temporary ranking limitations (potentially linked to attributes like “hostAge”).
  • Domain Authority/Whitelists: The documents mention features like “siteAuthority” and suggest the existence of whitelists, potentially giving preferential treatment to certain sites, especially in sensitive topics like news, politics, travel, or health.
  • “Twiddlers”: The concept of “Twiddlers” suggests Google uses small, temporary adjustments or re-ranking features separate from major algorithm updates.

There’s a lot more to this than I’ve summarised, seeing as this is a distillation of ~2,500 documents. There’s a deeper dive over at Mike King’s analysis and it’s a great read for serious SEOs.

Google’s Reaction

Google acknowledged the leak but urged caution, stating the information could be outdated, incomplete, or lack necessary context. They emphasised that drawing definitive conclusions based solely on the leaked documents would be misguided.

Conclusion

This leak provides quite an unprecedented, albeit potentially incomplete or outdated, glimpse into Google’s ranking systems. It highlights apparent discrepancies between Google’s internal mechanisms and its public communications, prompting the SEO industry to re-evaluate long-standing assumptions about how Google Search works.

Digital marketers must stay informed as the community continues to analyse the data and observe potential shifts in search results. My best advice is to monitor and record your own work to see what’s effective and what’s not.

If you need your brand website to perform better in search, speak with me, your local SEO consultant. Call me on 07730 499 539 or leave a message on my contact form.

Leave a comment