Thread: Robot.txt
View Single Post
  #2 (permalink)  
Old 19th September 2006, 05:49 AM
backstage backstage is offline
WD Addict Poster
 
Join Date: 21st June 2006
Posts: 200
Default

Another good thing about the robots.txt file is that enables you to exclude specific robots, so you can inhibit the Googlebot but enable SLURP to crawl a particular page.
This can be useful if you have optimized different pages for separate search engines. This may occur in order to give you flexibility, but a search engine may think you have duplicate pages and may penalize you. Follow these instruction to use the robots.txt file.
You open notepad and type in the following lines:

User-Agent: Slurp
Disallow: whatsisname.html
Disallow: page_optimized_for_google.html
Disallow: credit_card_list.html
Disallow: whatnot.html

Save it as robots.txt and upload it into your root directory. You can disallow as many pages for each crawler robot as you want, but to disallow certain pages for another crawler, you start a new line of code.

User-Agent: Slurp
Disallow: whatsisname.html
Disallow: page_optimized_for_google.html
Disallow: credit_card_list.html
Disallow: whatnot.html
User-Agent: Googlebot
Disallow: page_optimized_for_yahoo.html
Disallow: credit_card_list.html
Disallow: whatnot.html

If you want to disallow all crawlers, you replace the name of the user agent with the wildcard command (*)

Robots.txt is useful for not getting banned on search engines and can also be used to pinpoint crawlers when they come . Only crawlers request Robots.txt, and these requests show up on the server logs.
Reply With Quote