Feb 15

What is Robots.txt??

The robots.txt file is used to instruct search engine robots about what pages on your website should be crawled and consequently indexed. Most websites have files and folders that are not relevant for search engines (like images or admin files) therefore creating a robots.txt file can actually improve your website indexation. It also provides you a way to hide your content from search engine.

Robots.txt Syntax

# comment
User-agent: [robot-names][(*)Wild card character]
Disallow:[(/)all] [specific directory] [specific file Location]

User-agent

The value of this field is the name of the robot the record is describing access policy for. If more than one User-agent field is present the record describes an identical access policy for more than one robot. At least one field needs to be present per record. You can multiple ( more than 1) User-Agents in one entry.

Disallow

The value of this field specifies a partial URL that is not to be visited. This can be a full path, or a partial path; any URL that starts with this value will not be retrieved. For example, Disallow: /help disallows both /help.html and /help/index.html, whereas Disallow: /help/ would disallow /help/index.html but allow /help.html. Any empty value indicates that all URLs can be retrieved. At least one Disallow field needs to be present in a record.

Things to remember while writing robots.txt:

  1. Robots.txt should be written in a plain text editor like Notepad. Do not use MS-Word or any other text editor to create robots.txt. The bottom line is this file should have the extension “”.txt”" else it will be useless.
  2. A robots.txt file is always stored in the root of your site, and is always named in lower case. Spiders will always search for it in the root directory (e.g. http://www.example.com/robots.txt)
  3. There can only be one instruction per line,
  4. You should avoid putting spaces before the instructions (recommended to avoid making mistakes).
  5. For security reasons, be aware while preventing spiders from accidentally indexing sensitive and private areas of your site, as anybody at all can view your robots.txt file.

Read more…

Nov 16

You may be thinking that whats wrong in it, As a website looks to me, it will look the same for everyone else. Yes You are Right..!!! But only when we talk about the humans; Not in the case of spider robots. Actually the search engine spider robots are not human beings, who will look at your site as you do. They are automated programs, They extracts only the text and links out of your webpage. So No matter how glossy images, what colors you have used and whatever you have written over your images. Got it man!!. SO now hummmmmmmm I can understand your curiosity..wait wait..i going to tell you about the Browser, which will show you the site same as search engine spider robots sees it while crawling.
http://www.delorie.com/web/lynxview.html to watch the text version of any site. Most spiders see your site much as Lynx would. So Make sure that search engine spiders are able to see your site correctly. Its also helpful for the Search Engine Optimization and end user’s accessibility.

Terms Used:
Search Engine Optimization(SEO) - refers to a set of techniques that are used to improve the visibility of a website in search engine listings, which primarily are related to optimized keyword phrases widely used by net surfers to search for information, service or products.