How to ROBOTS.TXT File Use

nformations:

The robots.txt file enables you to restrict the access to Search engines, certain pages or files of your Web site. This will enable you to control the information indexed by your Web sites search engine.


 

1. You must create a text file (.txt) named: robots.txt.

2. You must insert the desired commands in this file to inform the search engines what must be indexed or not.

3. You insert your file in the “root” of your Web site. Example: In the file /public_html/ (linux) or /www/ (Windows).

You should see your file if you type: http://www.yourdomain.tld/robots.txt

4. Here is an example.

Exemple:
User-agent: *
Disallow: /cgi-bin/

In this example to prevent “All” search engines from indexing the contents in the /cgi-bin/ file.

– On the 1st line Use-agent: The star * indicates all the search engines.

– On the line 2nd line Disallow: you register the name of the file with the slashes (/).

5. Here is a small list of commands used for the 1st line:

 

User-agent: *

Includes all the search engines

User-agent: Googlebot

Includes only the Google search engine

User-agent: MSNBot

Includes only the MSN search engine

User-agent: Slurp

Includes only the Yahoo!search engine

User-agent: Fast

Includes only the Lycos and Fast/Alltheweb search engine

6. Here is small list of possible commands for the 2nd line:

 

Disallow: /

Allows to exclude all the pages from the Web site (no possible aspiration).

Disallow:

Allows not to exclude any page from the server (no constraint).

Empty or non-existent a robots.txt file will have an identical consequence.

Disallow: /cgi-bin/

Excludes all what contains the cgi-bin file to be indexed.

Disallow: /*.[extension de fichier]$

Excludes all types of file extensions indicated to be indexed.

Ex: Disallow: /*.pdf$

Disallow: /cgi-bin/ Excludes all what contains the cgi-bin file to be indexed.
Disallow: /*.[extension de fichier]$ Excludes all types of file extensions indicated to be indexed.
Ex: Disallow: /*.pdf$

7. You can register several Disallow lines as shown in this example:

User-agent: *
Disallow: /cgi-bin/
Disallow: /images/
Disallow: /secure/

Therefore, in this case you exclude the content of these 3 files from All search engines.

8. You can also personalize and command the various search engines:

User-agent: *
Disallow:
User-agent: Googlebot
Disallow: /cgi-bin/

In this case first of all you authorize all search engines to index the Web site. But in secondly you prohibit Google the /cgi-bin/ file.

Do not forget that it is necessary to register META beacons in each page which will be indexed to inform the robots: (here are some examples)

 

<META NAME=”MSNBot” CONTENT=”noindex” />

Prohibited MSNBot to index a page.

<META NAME=”*” CONTENT=”noindex” />

Prohibited all robots to index a page.

<META NAME=”Googlebot” CONTENT=”nofollow” />

Prohibited GoogleBot to follow links on a page.

<META NAME=”robots” CONTENT=”nofollow” />

Prohibited all robots to follow links on a page.

<META NAME=”MSNBot” CONTENT=”noindex,nofollow” />

Prohibited MSNBot to index and follow links.

<META NAME=”GoogleBot” CONTENT=”nocache” />

<META NAME=”GoogleBot” CONTENT=”noarchive” />

Prohibited Googlebot Cache.

Prohibited Googlebot Archive.

Advertisements

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s