
Manipulating Search Engine User Agents
A search engine optimisation specialist can instruct a search engine user agent (webbot/spider/crawler) what to index and what not to index using special external files called robots exclusion protocol files and on page meta tags or nofollow attributes. The following b-1st blog describes some of the SEO commands you can use to control what a search engine indexes off your ecommerce website...
It is useful to instruct a search engine not to crawl and index a page that is under construction. Adding the nofollow attribute to a link takes the form "<a href = "www.healthstore.uk.com" rel = "nofollow" >Health Store</a>" and prevents a link from being followed by search engine spiders.
The "nofollow" attribute can also be used in a robots meta tag placed in the head of a webpage. The following will instruct search engines not to index this page and not to follow any links from this page for use in indexing or weighting...
<meta name="robots" content="noindex, nofollow" />
The following will tell a spider not to index this page, but to allow the following of links that can then be indexed and weighted...
<meta name="robots" content="noindex, follow">
The following will instruct the spider to index this page but not to follow any links from it and is most commonly used in message boards...
<meta name="robots" content="index, nofollow">
"Robots exclustion protocol" is used to prevent directories from being indexed in a separate robots.txt file which is located in the site's root directory.
The following instruction tells the search engines to disallow NO directories for any search engine.
User-agent: *
Disallow:
Conversely, the following command will disallow ALL directories for any search engine.
User-agent: *
Disallow: /
