HOW TO restrict search engine bots access to specific pages?

Talk and info about common issues

Moderators: ne_moj, zewa

administrator
Site Admin
Site Admin
Posts: 5930
Joined: Jan 7th, '09, 23:18
Contact:

HOW TO restrict search engine bots access to specific pages?

Postby administrator » Feb 20th, '14, 09:59

This topic is related to all scripts, based on MicroCMS, like: Hotel Site, Medical Appointment, Shopping Cart etc..

In some cases it's useful to restrict search engine bots access to specific pages.
Why it's important?

Let's say the whole link to Medical Appointment page look like this:

Code: Select all

http://www.apphp.com/php-medical-appointment/examples/sample2/index.php?page=appointment_details&prm=ZG9jaWQ9MSZkc3BlY2lkPTEmc2NoaWQ9MyZkYWRkaWQ9MSZkYXRlPTIwMTQtMDItMTkmc3RhcnRfdGltZT0xNS0zMCZkdXJhdGlvbj0xNQ==


and this page shows doctors availability to the certain date. Google bots will index it and it will be stored in their database. After some period of time you could see this page in search result (even it shows the appointments in PAST!). This simple example shows you that there are some cases, when you may prefer to not allow search engine bots to index specific pages on your site, whether these pages includes some internal information or shows a result of the search etc.

The simplest way to remove such pages from search index is to block access to them using robots.txt file.
A robots.txt file restricts access to your site by search engine robots that crawl the web.


The simplest robots.txt file uses two rules:

  • User-agent: the robot the following rule applies to
  • Disallow: the URL you want to block

These two lines are considered a single entry in the file. You can include as many entries as you want. You can include multiple Disallow lines and multiple user-agents in one entry.

Each section in the robots.txt file is separate and does not build upon previous sections. For example:

Code: Select all

User-agent: *
Disallow: /folder1/

User-Agent: Googlebot
Disallow: /folder2/

User-Agent: Googlebot
Disallow: /index.php?page=login

User-Agent: Googlebot
Disallow: /index.php?page=appointment_details*


Find more information here:
1. Google Help: https://support.google.com/webmasters/a ... 6449?hl=en
2. Wikipedia "Robots exclusion standard": http://en.wikipedia.org/wiki/Robots_exclusion_standard

Example of robots.txt file:
robots.zip
(144 Bytes) Downloaded 141 times

Return to “ApPHP MicroCMS {HOW TO}”