মঙ্গলবার, ১৪ মে, ২০১৩

Robots.txt


Meta Robots Tags, About Robots.txt and Search Indexing Robots
EntryMeaning
User-agent: *
Disallow:
Because nothing is disallowed, everything is allowed for every robot.
User-agent: mybot
Disallow: /
mybot robot may not index anything, because the root path (/) is disallowed.
User-agent: *
Allow: /
For all user agents, allow.
User-agent: BadBotAllow: /About/robot-policy.htmlDisallow: /
The BadBot robot can see the robot policy document, but nothing else.All other user-agents are by default allowed to see everything.This only protects a site if "BadBot" follows the directives in robots.txt
User-agent: *Disallow: /cgi-bin/
Disallow: /tmp/
Disallow: /private
In this example, all robots can visit the whole site, with the exception of the two directories mentioned and any path that starts with private at the host root directory, including items in privatedir/mystuff and the file privateer.html
User-agent: BadBot
Disallow: /

User-agent: *
Disallow: /*/private/*
The blank line indicates a new "record" - a new user agent command. All other robots can see everything except any subdirectory named "private" (using the wildcard character)
User-agent: WeirdBotDisallow: /links/listing.htmlDisallow: /tmp/
Disallow: /private/

User-agent: *
Allow: /
Disallow: /temp*
Alllow: *temperature*

Disallow: /private/
This keeps the WeirdBot from visiting the listing page in the links directory, the tmp directory and the private directory.
Allother robots can see everything except the temp directories or files,but should crawl files and directories named "temperature", and shouldnot crawl private directories. Note that the robots will use thelongest matching string, so temps and temporary will match the Disallow, while temperatures will match the Allow.
Bad Examples - Common Wrong Entries
use one of the robots.txt checkers to see if your file is malformed
User-agent: googlebot
Disallow /
NO! This entry is missing the colon after the disallow.
User-agent: sidewiner
Disallow: /tmp/
NO! Robots will ignore misspelled User Agent names (it should be "sidewinder"). Check your server logs for User Agent name and the listings of User Agent names.
User-agent: MSNbot
Disallow: /PRIVATE
WARNING! Many robots and webservers are case-sensitive. So this path will not match any root-level folders named private or Private.
User-agent: *
Disallow: /tmp/
User-agent: Weirdbot
Disallow: /links/listing.html
Disallow: /tmp/
Robots generally read from top to bottom and stop when they reach something that applies to them. So Weirdbot would probably stop at the first record, *.
Ifthere's a specific User Agent, robots don't check the * (all useragents) block, so any general directives should be repeated in thespecial blocks.

0 comments: