As the name suggests, it is the map of our website. It is a page or pages that contain a list of and link to all other documents on our website. Theoretically it is designed to give our visitors a quick way to find out what they are looking for on our website without browsing entire content.
Importance of Sitemaps:
It helps ensure that all our content-rich web pages are exposed to search engine spider.It gives us a way to spread Google’s page rank to all the pages that need it and distribute it among these pages.One thing to keep in mind is that, sitemap must only be linked to from our homepage and no other web page because,
a) We want search engine spider to find links directly from home page
b) Page Rank can spread quickly
XML Sitemap : The sitemap which we discussed above was more for the visitors as it will be present on the website as a link for ease of navigation. XML sitemaps are basically for the search engine spiders, so that the pages can be cached quickly and indexing of the pages also starts quickly.
We can create XML sitemaps by using sitemap generators and upload it on root folder of the website data. Limit the number of pages listed on your sitemap to maximum 30; otherwise it can be mistaken for link farm by search engines.
On your server space where you have hosted your website, you might have many pages or folders which you do not want the search engine spiders to crawl, the robots.txt is exactly for this purpose. As the name suggests it is for the search engine robots and the content of this file, directs the search engine crawler about which data must be indexed and which data should not be indexed.
Now let us discuss the content of this robots.txt file, it contains two things, user agent and disallow.
Here is a basic “robots.txt”,
With the above declared, all robots (indicated by “*”) are instructed to not index any of your pages (indicated by “/”). Most likely not what you want, but you get the idea.
If we want all robots to crawl all the stuff, we simply remove the “/”,
Lets get a little more discriminatory now. While every webmaster loves Google, you may not want Google’s Image bot crawling your site’s images and making them searchable online, if just to save bandwidth. The below declaration will do the trick
There are many combinations which we can use, it all depends on us, what we want the search engine to crawl and what not to crawl.