Do you know the differences between crawling and indexing? –

We are not fully aware of the large amount of information that the Internet provides us. All this volume of data is reviewed by the Google botswith the intention of being able to collect and classify it to be displayed in search results.

This process of making Google aware of the information found on the Internet is called trackingand then we will explain in more detail what it consists of, as well as the main differences with respect to the term indexationsince both are usually confused.

What is tracking?

The Internet resembles a great library in continuous growth, where millions and millions of files are found. To find these kinds of public files, Google bots make use of tracking in as many URLs as possible. In the case of Google, this software is called a “web crawler” and is responsible for discovering all those web pages that are in the public domain. The best known tracker is called «googlebot“”.

The crawl process begins with a list of web addresses, which have been obtained in previous crawls, as well as the sitemap files themselves that have been created by the website owners. When visiting each one of them, the crawler follows the links it finds in its path, reaching URLs that it already knew or, on the contrary, discovering new content.

This is where what is known as crawl budget” either . This is the time that Google assigns to crawl our page and depending on this, Google will be able to explore more or less pages of our site in each crawl.

See also  Facebook Blueprint: The Most Complete Educational Platform -

Those web pages that can be crawled by search engines are commonly referred to as crawlable pages. On the other hand, those where Google bots will not arrive will be uncrawlable pages.

Why is it important for Google to crawl your website?

Once the content of our website is ready to be launched on the Internet, it is necessary that we carry out the necessary actions so that the “Googlebot” stop by our site. This is of vital importance, even more so if we take into account that almost 90% of web traffic is channeled from Google. If in this case, our website is new, Google doesn’t know it yet. Therefore, we must make sure that we have captured that attention, so that Google crawls our page.

Once we have made Google aware of the existence of our content, through tracking, we will move on to the next link, the . Once Google has discovered our website, will include it in its index, classifying it.

Differences between crawling and indexing

It is possible that at this point, you have doubts about the differences between crawling and indexing You must understand that these are two different parts of the process through which Google collects and stores the information found on our website.

These two concepts are related to each other. On the one hand, the traceability It will define the capacity of the search engine to be able to reach the content located on a certain web page, tracking it. If your website does not have traceability problems, the “google spiders” they can easily reach your content. If, on the other hand, they find broken links or pages without internal linking, it can cause a certain inability for search engines to crawl your website.

See also  Google Shopping will publish your Products for Free -

On the other hand, the indexabilityrefers to the ability of search engines to add previously crawled pages to their content index, since in this way, they can be classified and thanks to SEO techniques, our content will be visible to users with a specific search intent. In this way, although Google can crawl the entirety of our website, it is not necessary for all those URLs to be indexed.

We can differentiate between following assumptions that may be given in a URL on our website. From them, we will indicate to Google how it should act:

  • Crawlable and indexable: They are URLs to which Google can access and view their content, and may also become indexed by search engines. This does not mean that because it is crawlable, it is always indexed, since it is up to Google to decide whether to index it or not.
  • Crawlable and non-indexable: Google can access it and display its content, however, by telling the search engine that we do not want it to be indexed, it will not be shown in search results. However, this does not mean that Google does not access it frequently.
  • Non-crawlable and indexable: They are those URLs where we do not want Google to access (normally defined in the robots.txt file) and therefore, it will not be able to read the value of the meta-robots that we have assigned to it. But if they can be indexable through other means (sitemaps, external links,…). This is the well-known case that we can all see in some Search Console projects and that Google defines as “Although a robots file has blocked it, the url has been indexed”.
  • Non-crawlable and non-indexable: URLs blocked from access by bots and also defined as so that they cannot (or should not) be crawled or indexed.

Now that you know the differences between crawling and indexingwe recommend that you carry out a study of your web project, so that you understand the differentiation on the type of content of your website, thus being able to indicate to Google the guidelines to follow when going through your page.

In We work with web positioning projects, optimizing the content to the maximum. In this way, we ensure that Google crawls our page frequently, as well as indicating what content we are interested in including in its index. If you are interested and want to know more about all kinds of , do not hesitate to contact us.

Loading Facebook Comments ...
Loading Disqus Comments ...