Writer Blog: Using Robots txt for SEO

Robots.txt disallows all or select search engine bots or crawlers accessing whole or part of your website. Use robots.txt file to improve your SEO efforts. Here is how.

What does robots.txt do?

Robots.txt is a text file that tells search engine bots or crawlers not to visit certain parts of your website. You don't want to see search engine bots like Google Bot to see folders in your website that may contain sensitive data. You can also hide duplicate content from the eyes of search engines by disallowing the bots from accessing your pages.

Structure of robots.txt

The robots.txt text file is usually placed on the root folder of the website, along with the index file. It will be publicly accessible at http://www.example.com/robots.txt

Disallowing Every Search Engine Bots

This is a simple robots.txt file that rejects every search engine bots:

User-agent: *
Disallow: /

The '*' refers to all search engine bots. The '/' prevents search engines from accessing any of the folders.

As you can see, using this directive prevents search engines from accessing your website and listing your website in search results. You do this when your website is in building stage and you don't want search engines to index your website.

Disallowing a Specific Search Engine

User-agent: Google
Disallow: /

This will disallow Google bots from accessing your site. However, it will allow all other bots to access your site.

Disallowing Certain Parts of Your Website

You can disallow certain parts of your website by using the following directive in the robots.txt file

User-agent: *
Disallow: /tags/
Disallow: /images/
Disallow: /tmp/

This will prevent search engine bots from crawling the folders named tags, images and tmp. You need to add a separate disallow line for every folder you want to hide from the search engine spiders.

Search engines wrongly assume your website has duplicate content if you use 'tags' or 'labels' to organize your articles. The snippets of a single article or content page can appear under different tags, which search engines think of as duplicate content. Disallow access to the tags pages and they won't see duplicate content.

Disallowing Specific Pages of Your Website

User-agent: *
Disallow: /www/pages/page1.html
Disallow: /www/pages/page2.html

This will prevent every search engine bots access to specific pages (page1.html and page2.html) of your website.

Robots.txt Use Can Go Wrong

Even experienced web designers make the mistake of using robots.txt the wrong way. That is, they disallow access to every search engine bot and complain their website is not showing up in search engine results pages.

Wrong use of robots.txt file can undo the effects of your good SEO. So, make sure you don't disallow parts of your website that you want to be indexed by Google and other search engines.

No, there is not a directive to ALLOW access to a page or folder.

Writer Blog

Pages

Wednesday, 30 November 2011

Using Robots txt for SEO

2 comments: