For years and years Google has told us Google doesn’t index all the content and URLs they know about on the web. No just because there is a directive telling them not to but because Google chooses not to index those pages because of various factors like PageRank, duplication, other quality signals. But a WebmasterWorld thread is asking, should they index the whole web?
Here is a tweet from John back in 2015:
— ð John ð (@JohnMu) August 19, 2015
Here is a similar tweet from just a few days ago:
One thing to add here – we don't index all URLs on the web, so even once it's reprocessed here, it would be normal that not every URL on every site is indexed. Awesome sites with minimal duplication help us recognize the value of indexing more of your pages.
— ð John ð (@JohnMu) April 7, 2019
The messaging for years has been clear – Google doesn’t want to index the whole web.
Why not index the whole web?
Well, like any company, Google has limited resources. Yes, Google’s resources are much more than 99% of the companies out there but still, they have limited resources. It wouldn’t be efficient or productive always to index every URL because many of those URLs they might be able to know ahead of time is duplicative to other URLs within the same site or outside sides. Or it may be that that URL is doing something shady and doesn’t deserve to be in Google’s index. Or maybe Google doesn’t think the quality signals of that URL deserves it to be crawled fully and indexed? Google is about efficiency, and when it comes to crawling – Google has described how they determine how much of a site they index and how fast they index a site – it is called crawl budget.
Again, it is not new that Google doesn’t index the whole web.
Why should Google index the whole web?
That is where we go to WebmasterWorld, where the founder Brett Tabke says it shows a lack of commitment to their mission. He wrote:
It questions Google’s commitment to search. (eg: their only profitable endeavor). We live in the insulated bubble of search marketing that we need to pop our head up every once-in-awhile to see what the general public is thinking. Even a tech trade rage like Lifehacker have no idea what goes on in search. For them to notice Google is no longer indexing all the content, is significant.
Because they are impacting quality. They’ve reached a tipping point of “non indexation” that even tech people are starting to notice.
They have also crammed so many in-site links on the to serps now, that organic exposure is falling fast. Thus, they are killing off swaths of the web (starting with older content).
I guess, if Google technically can index the whole web – Brett is saying they have a commitment to do so in order to serve their overall goal of organizing the world’s information and by excluding some of that information, then they are not serving that purpose?
To be clear, Google does have a site called how search works and they describe “The Google Search index contains hundreds of billions of webpages and is well over 100,000,000 gigabytes in size.” They state their mission:
From the beginning, our mission has been to organize the world’s information and make it universally accessible and useful. Today, people around the world turn to Search to find information, learn about topics of interest, and make important decisions. We consider it a privilege to be able to help. As technology continues to evolve, our commitment will always be the same: helping everyone find the information they need.
What do you think?
Forum discussion at WebmasterWorld.