What is a robots.txt file and what is it for?

Rare is the day that we do not use Google to search for some type of information. The most famous search engine in the world offers us all kinds of solutions for our lives since it is one of those for citizens. Nevertheless, search engines need information, they are curious by nature and want to know absolutely everything about our lives and web pages. In short, they are greedy to obtain knowledge and information, hence the importance of knowing the use and run a robots.txt.

Search engines have machines or robots that crawl the web to classify and index as much information as possible in their databases. Robots.txt are a class of machines widely used on the web to index web content. Spammers, for example, use them to track email addresses. But beware, they also have many more uses like locating sites in XML or blocking access to code files and directories

The world of robots.txt is exciting and today we are going to try to shed a little light on the subject, therefore, we are going to tell how the robots.txt file works, what you need to know and how you should deal with them.

What is robots.txt file

When we create a new website we need Google can access our page to track our information. To carry out this task, it is necessary to create a text file (with a .txt extension) in our domain to provide the search engine with all the information that we are interested in knowing about our website or business. At the same time this .txt file is used to prevent bots or robots.txt from adding data and information that we do not wish to share with the Mountain View company. According to , the definition of a .txt file is as follows:

See also  Disney Spain opens its official store on Aliexpress - Marketing 4 Ecommerce - Your online marketing magazine for e-commerce

“A file robots.txt it’s a file at the root of a site that tells you which parts you don’t want search engine crawlers to access. The file uses the , which is a protocol with a small set of commands that can be used to indicate website access by section and by specific types of web crawlers (such as mobile crawlers or desktop crawlers).”

How the robots.txt file works

The operation of a robots.txt is simpler than it seems. The first thing we need to know is what the robots.txt file is for and what elements of our website it is capable of indexing or not.

In addition, the operation of robots.txt is limited and there are other systems for our web addresses to be found on the web.

Please note that the instructions in the robots.txt are merely guidelines and not definitive. For example, Google robots called Googlebots do obey the commands in the robots.txt filebut other search engines (, Alltheweb, ASK or Altavista) do not have to do it.

For example, Google will not crawl or index any content information on pages that we block with robots.txt, however, yes will index all addresses or URLs that are in other elements or web pages even though these are restricted within the .txt file, therefore, an important piece of advice is that if your web page is going to have sensitive information but you do not want to share it, it is best not to create it.

Two types of robots: ube-agents and robots

Google differentiates several classes of robots:

  • The user-agents that you use specifically to search and to give instructions. In order to use this robot, the following command must be added: User-agent: *
  • The rest of the robots are Googlebots: the Googlebot-Mobile (specific for mobile devices) and the Googlebot-Image which is for images and photography.
See also  TikTok launches the option to put text on the covers of your videos - Marketing 4 Ecommerce - Your online marketing magazine for e-commerce

The Disallow command

If we want to limit the indexing of files for this type of robots, we must use the “Disallow” command. For example, if we want to remove some content from our website we will put the following:

  • In the case of block the whole site we will put a slash, like this: Disallow: /
  • if we want lock a directory and everything that is inside we will put the following: Disallow: /https:///marketing/
  • to block a whole page is put after Disallow like this: Disallow: /https:///marketing/

With respect to the images, they are eliminated in the following way:

  • If you only want to delete an image: User-agent: Googlebot-Image
    Disallow: /images/marketing and electronic commerce.jpg
  • If you want to eradicate all images from Google Imagesincludes the following commands:
    • User-agent: Googlebot-Image
      Disallow: /
  • To block files of a certain type (for example, .gif) you can include the following command:
    • User-agent: Googlebot
      Disallow: /*.gif$

Other commands that are also used a lot

  • Sitemap – Indicate where the sitemap is in XML.
  • Allow – Works the other way around the Disallow command as it allows access to directories and pages. It can also be used in whole or in part to override the Disallow command.
  • Crawl-delay – This command instructs the bot about the number of seconds to load between each page. In fact, its use is quite common in topics to improve the speed of loading the server.

How the robots.txt file is created

Before we told you that the operation of the robots.txt file is very easy. For example, for its creation it is necessary to give it access to the root of the domain and upload the file in text format (txt) with the name “robots.txt” to the root directory of the first level of our server where the web page that we want to index is located. .

See also  Differences between Amazon Seller and Amazon Vendor: what type of seller should you use - Marketing 4 Ecommerce - Your online marketing magazine for e-commerce

Do not forget to use a text file for the creation of the file, in Windows and Mac there are plain text files that can be used. An example would be the following: http:///robots.txt

Finally, you must check the operation of your robots.txt, for this, Google gives us a test tool in Google Search Console. There you can check how Googlebot will read the file and inform you of possible errors it may have.

In case you need more information on the subject, I recommend that you go to the one where they inform you of everything you need to know about it. operation of a robots.txt. And what do you think about restricting information from your website to Google? doDo you really think it is an effective system?? Leave us your opinion in the comments and we will be happy to answer you.

Stay informed of the most relevant news on our news channel

Loading Facebook Comments ...
Loading Disqus Comments ...