What is robots.txt?Robots.txt is a text (not html) file you put on your site to tell search robots which pages you would like them not to visit. Robots.txt is by no means mandatory for search engines but generally SE obey what they are asked not to do. It is important to clarify that robots.txt is not a way from preventing SE from crawling your site (i.e. it is not a firewall, or a kind of password protection) and the fact that you put a robots.txt file is something like putting a note "Please, do not enter" on an unlocked door - e.g. you cannot prevent thieves from coming in but the good guys will not open to door and enter. That is why we say that if you have really sensitive data, it is too nave to rely on robots.txt to protect it from being indexed and displayed in search results. The location of robots.txt is very important. It must be in the main directory because otherwise user agents (search engines) will not be able to find it - they do not search the whole site f or a file named robots.txt. Instead, they look first in the main directory and if they don't find it there, they simply assume that this site does not have a robots.txt file and therefore they index everything they find along the way. So, if you don't put robots.txt in the right place, do not be surprised that search engines index your whole site.
Why is it used?It is great when search engines frequently visit your site and index your content but often there are cases when indexing parts of your online content is not what you want. if you happen to have sensitive data on your site that you do not want the world to see, you will also prefer that search engines do not index these pages (although in this case the only sure way for not indexing sensitive data is to keep it offline on a separate machine). Additionally, if you want to save some bandwidth by excluding images, style sheets and JavaScript from indexing, you also need a way to tell spiders to keep away from these items. One way to tell search engines which files and folders on your Web site to avoid is with the use of the Robots Meta tag. But since not all search engines read Meta tags, the Robots Meta tag can simply go unnoticed. A better way to inform SE about your will is to use a robots.txt file. Structure of robot.txt:The structure of a robots.txt is pre tty simple (and barely flexible) - it is an endless list of user agents and disallowed files and directories. Basically, the syntax is as follows: User-agent:Disallow:
"User-agent:" Here user agents are search engines' crawlers and disallow: lists the files and directories to be excluded from indexing. In addition to "user-agent:" and "disallow:" entries, you can include comment lines - just put the # sign at the beginning of the line: # All user agents are disallowed to see the /temp directory. User-agent: *Disallow: /temp/
Now learn a ton of Article Marketing Tips and Techniques
No comments:
Post a Comment