Another good thing about the robots.txt file is that enables you to exclude specific robots, so you can inhibit the Googlebot but enable SLURP to crawl a particular page.
This can be useful if you have optimized different pages for separate search engines. This may occur in order to give you flexibility, but a search engine may think you have duplicate pages and may penalize you. Follow these instruction to use the robots.txt file.
You open notepad and type in the following lines:
User-Agent: Slurp
Disallow: whatsisname.html
Disallow: page_optimized_for_google.html
Disallow: credit_card_list.html
Disallow: whatnot.html
Save it as robots.txt and upload it into your root directory. You can disallow as many pages for each crawler robot as you want, but to disallow certain pages for another crawler, you start a new line of code.
User-Agent: Slurp
Disallow: whatsisname.html
Disallow: page_optimized_for_google.html
Disallow: credit_card_list.html
Disallow: whatnot.html
User-Agent: Googlebot
Disallow: page_optimized_for_yahoo.html
Disallow: credit_card_list.html
Disallow: whatnot.html
If you want to disallow all crawlers, you replace the name of the user agent with the wildcard command (*)
Robots.txt is useful for not getting banned on search engines and can also be used to pinpoint crawlers when they come . Only crawlers request Robots.txt, and these requests show up on the server logs.
|