Have you already write your comment?



[Valid RSS]

Add to Technorati Favorites

Technology Blog Carnival Index - browse the archives Add to Google Reader or Homepage Powered by FeedBurner RSS-Portal.com - Newsfeed RSS-Portal.com - Newsfeed Blog Directory & Search engine Bloglisting.net - The internets fastest growing blog directory Technology Blogs - Blog Rankings

Free Web Hit Counter My Popularity (by popuri.us) Get Free Shots from Snap.com

Google Translator

Improved Your Ranked By Controlling Search Engine Spiders

Now let's learn about the spider who used to crawl to our site / blog. I think many of us use the facility from Google such as Google Webmaster Tools, but have you used it service maximize? If yes, so you must known about robots.txt which used by the spider to crawl through our site / blog. Now lets get back to the topic.

When it comes to getting your website listed at the top of the search engines keyword search rankings, it is essential for you to gain a deeper understanding of the search engine spiders that crawl over your website. After all, it is the spiders that determine the relevance of your website and decide where your site will land in the search engine results page. Therefore, by learning how to control the direction of the spiders, you can be certain your website will rise in rankings.

Gaining Control with the Help of Robots.txt

You may think that gaining control of search engine spiders is an impossible task, but it is actually easier than you might think when you take advantage of a handy little tool called the robots.txt file. With the robots.txt file, you can give the spiders the direction they need to locate the most important pages on your website while preventing them from wasting time on the more obscure pages such as your About Us and Privacy Policy pages. After all, these pages won't do much to improve your search engine ranking and won't help your target market find your website, so why should the spiders waste their time exploring these pages when ranking your site? 
Implementing the Robots.txt Tool

To successfully use the robots.txt tool, you first need to determine which pages you don't want the spiders to search. Then, slowly begin making the changes to your site. By using the tool on only one or two pages at a time, you will be better capable of identifying mistakes that you may have made during the process.

To make your changes, you will need to add the robots.txt file to the root directory of your domain or to your subdomains. Adding it to your subdirectories will not work. For example, you may add the robots.txt file to a url such as http://domain.com/robots.txt or to http://privacypolicy.domain.com/robots.txt. But, adding it to a subdirectory such as http://www.domain.com/privacypolicy/robots.text will not work. With just one robots.txt file within your root directory, you can manage your entire site. If you have subdomains, however, you will need a robots.txt file for each one that you need to manage. You will also need separate robots.txt files for your secure (https) and nonsecure (http) pages
Creating a Robots.txt File

Creating a robots.txt file is a relatively simple process, as you only need to name your text file robots.txt within any text editor, such as Textpad, NotePad or Apple TextEdit. Your robots.txt file only needs to contain two lines in order to be effective. If you wanted to stop the spiders from searching the archives of the blog on your site, for example, you would add the following to your robots.txt file:

    User-agent: * Disallow: /archives/
The "User-agent" line is used to define which search engine spiders you want to have blocked. By placing the asterisk (*) here, you are instructing all search engine spiders to avoid the specified pages. You can, however, target specific search engine spiders by replacing the asterisk with the following codes:

     * Google - Googlebot

     * Yahoo - Slurp

     * Microsoft - msnbot

     * Ask - Teoma

The "Disallow" line specifies which part of the site you want the spiders to ignore. So, if you want the spiders to ignore the categories portion of your blog, for example, you would replace "archives" with "category" and so on. If you wanted to instruct the spiders to ignore multiple sections, you would simply add a new "Disallow" line for each area you want to be ignored. Just as you can name specific areas that you want the spiders to avoid, you can also list specific areas that you want specific spiders to view. For example, while you may want most spiders to avoid a specific area, you may want the MSN mediabot, Google image bot or Google AdWords bot to visit those areas. In this case, you can use the asterisk to instruct all search engines to avoid the area while instructing a specific spider to allow the same area. If you want Google's Adsense bot to access a folder, for example, you would create the following command:

     User-agent: * Disallow: /folder/

     User-agent: Mediapartners-Google Allow: /folder/

You can also use your robots.txt files to prevent dynamic URLs from being indexed by the search engine spiders. You can accomplish this with the following template:

     User-agent: * Disallow: /*&  
With this command, you are instructing the spiders to index only one of the URLs that matches the parameters you have set. For example, if you had the following dynamic URLs:

     * /greatcars/details.php?propcode=ANCHORS&SRCH=tr

     * /greatcars/details.php?propcode=ANCHORS&vr=1

     * /greatcars/details.php?propcode=ANCHORS

Your robots.txt instructions will tell the spiders to only list the third example because it will disallow any URLs that start with a forward slash (/) and contain the & symbol. You can use the same strategy to block any URLs containing a question mark by using the following:

     User-agent: * Disallow: /*?

Or, you can block all directories that contain a specific word in the URL. For example, you might create a robots.txt file such as the following:

     User-agent: * Disallow: /corvette*/

With this command, any page with a URL containing the word "Corvette" will not be crawled by the spiders. It is important to use caution when using these directives, however, as they will cause the spiders to avoid all pages containing the word you specify. As a result, you may accidentally block pages that you do want to be indexed. If you do want to block all but one or two pages with URLs containing a specific word, you can create a robots.txt file that specifically allows the page you still want to be indexed. In this case, your robots.txt file would look something like this:

     User-agent: * Disallow: /corvette*/ Allow: /greatcards/corvettesandvipers/details.html

It is also possible for you to instruct the spiders to avoid an entire folder on your website while still allowing it to access specific pages within that folder. 


Technorati Tag
Generated By Technorati Tag Generator

Related article »

Comments :

0 comments to “Improved Your Ranked By Controlling Search Engine Spiders”

Post a Comment

Every comment you have is a support for me. And if you don't mind please use this language. All comment will be filtered by the operator. Thank you for your comment in these post.

Friends who follow me

Blog Tags

© SEO Friendly Copyright by blogdarma1280 | The Gadget | How To Blogging | Template by Custom Templates | Modify blogdarma1280.blogspot.com