Google Technical guidelines

Use a text browser such as Lynx to examine your site, because most spiders see your site much as Lynx would. If fancy features such as JavaScript, cookies, session IDs, frames, DHTML, or Flash keep you from seeing all of your site in a text browser, then engine spiders may have trouble crawling your site.

Allow search bots to crawl your sites without session IDs or arguments that track their through the site. These techniques are useful for individual user behavior, but the access pattern of bots is entirely different. Using these techniques may result in incomplete indexing of your site, as bots may not be able to eliminate URLs that look different but actually point to the same page. Make sure your web server supports the If-Modified-Since HTTP header. This feature allows your web server to tell whether your content has since we last crawled your site. Supporting this feature saves you bandwidth and overhead. Make use of the robots.txt file on your web server. This file tells crawlers which directories can or cannot be crawled. Make sure it’s current for your site so that you don’t accidentally block the Googlebot crawler. Visit to learn how to instruct robots when they visit your site. You can your robots.txt file to make sure you’re using it correctly with the robots.txt analysis tool available in webmaster tools. If your company buys a content management system, make sure that the system can your content so that search engine spiders can crawl your site. Use robots.txt to prevent crawling of search results or other auto-generated that don’t add much value for users from search engines.

Related posts

No Comments

Chinese (Traditional)EnglishFrenchGermanHindiRussianSerbianSpanish