Configure robots.txt for Magento |

Robots are machines that belong to search engines and are responsible for scanning your website and indexing all its pages and content. The robots file will be very useful to us to control the access or restriction of robots in our pages and directories. We decide what content we want to be indexed and what we are not interested in indexing. In addition, the robots file will help us eliminate duplicate content problems.

If you need help to configure the robots.txt file The first thing you should know are its two main elements User-agent and Disallow.

user-agent specifies the robot, such as “Gloogebot” to specify the googlebot, or “*” to specify access to all robots.

Disallow will indicate the contents that we want not to be indexed in search engines. But to specify its greater or lesser scope we must specify:

Disallow: / prohibits entry to the entire folder.

Disallow: /forum/ prohibits entry to the forum directory.

Disallow: Allows entry to the entire folder.

You must take special care not to block certain web resources to the Google spider. Keep in mind that each CMS has its own characteristics that affect the robots file.

Here we have a default robots file, without modifications:

User-agent: *

Disallow: /index.php/

Disallow: /404/

Disallow: /admin/

Disallow: /api/

Disallow: /app/

Disallow: /catalog/category/view/

Disallow: /catalog/product/view/

Disallow: /catalog/product_compare/

Disallow: /catalogsearch/

Disallow: /cgi-bin/

Disallow: /checkout/

Disallow: /contacts/

Disallow: /customer/

Disallow: /downloader/

Disallow: /install/

Disallow: /images/

Disallow: /js/

Disallow: /lib/

Disallow: /magento/

Disallow: /media/

Disallow: /newsletter/

Disallow: /pkginfo/

Disallow: /private/

See also  What is Amazon FBA and what are its advantages? -

Disallow: /poll/

Disallow: /report/

Disallow: /sendfriend/

Disallow: /skin/

Disallow: /tag/

Disallow: /var/

Disallow: /wishlist/

For configure robot.txt file for Magentoyou must not forget to exclude some lines from the robots, such as:
#Disallow: /js/
#Disallow: /lib/
#Disallow: /media/
#Disallow: /checkout/
#Disallow: /*.js$
#Disallow: /*.css$
Disallow: /*.php$
Disallow: /*?SID=

With these indications we are telling Google to index the JavaScrip and the Css. We can not block this resource, so if we want, we can directly remove the line from the robots file, or write it as in the previous example. With these indications we are telling Google to index the JavaScript and the css.

It is important that Google can access and index the folder “half”, which contains all the multimedia files, images and photographs on the web. We need the search engine to take into account our folder “lib” because it is not recommended not to index the library.

Remember not to forget to check the changes in Web Master Toolsnow, to make sure that Google has access to the JS and CSS files.

Be sure to check out the “locked resources” to prevent important files from being locked. A list of blocked pages will appear. By clicking on each of them, Google will tell you what you should do to stop blocking them.

If so, you need to modify the Robots.txt file to stop it from doing this. You can do it manually or with the help of SEO Plugins.

See also  How to unlink your email account from Google Adwords

Explore your website as Google would from the “Explore as Google”. We must explore with each of the options that the Google browser shows us and check how the search engine is really seeing our site.

When we have everything we will click on the button “Send to index”.

And that’s it! This way we will make sure that Google can crawl our website without problems.

Loading Facebook Comments ...
Loading Disqus Comments ...