Ridd
Ridd

Reputation: 11917

robots.txt in codeigniter - allow view/function

I read a little bit about robots.txt and I read I should disallow all folders in my web application, but I would like to allow bots to read main page and one view (url is for example: www.mywebapp/searchresults - it's a codeigniter route - it's called from application/controller/function).

Folder structure for example is:

-index.php(should be able to read by bots)
-application
  -controllers
    -controller(here is a function which load view)
  -views
-public

Should I create robots.txt like this:

User-agent: *
Disallow: /application/
Disallow: /public/
Allow: /application/controllers/function

or using routes something like

User-agent: *
Disallow: /application/
Disallow: /public/
Allow: /www.mywebapp/searchresults

or maybe using views?

User-agent: *
Disallow: /application/
Disallow: /public/
Allow: /application/views/search/index.php

Thanks!

Upvotes: 1

Views: 13835

Answers (2)

Ridd
Ridd

Reputation: 11917

Answer to my own, old question:

When we would like to allow bots to read some page, we need use our URL (routing) so in this case:

Allow: /www.mywebapp/searchresults

In some cases we also could disallow some pages by HTML tag (add to header):

<META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW">

When we would like to block some folder i.e with pictures just do:

Disallow: /public/images

Upvotes: 1

JAVA_RMI
JAVA_RMI

Reputation: 139

You don't block the view file as that isn't directly accessible to the crawlers. You need to block the URL that is used to access your view

The robots.txt file MUST be placed in the document root of the host. It won’t work in other locations.

If your host is www.example.com, it needs to be accessible at http://www.example.com/robots.txt

To remove directories or individual pages of your website, you can place a robots.txt file at the root of your server.When creating your robots.txt file, please keep the following in mind: When deciding which pages to crawl on a particular host, Googlebot will obey the first record in the robots.txt file with a User-agent starting with "Googlebot." If no such entry exists, it will obey the first entry with a User-agent of "". Additionally, Google has introduced increased flexibility to the robots.txt file standard through the use asterisks. Disallow patterns may include "" to match any sequence of characters, and patterns may end in "$" to indicate the end of a name.

To remove all pages under a particular directory (for example, listings), you'd use the following robots.txt entry:

User-agent: Googlebot
Disallow: /listings
To remove all files of a specific file type (for example, .gif), you'd use the following robots.txt entry:

User-agent: Googlebot
Disallow: /*.gif$ 
To remove dynamically generated pages, you'd use this robots.txt entry:

User-agent: Googlebot
Disallow: /*? 
Option 2: Meta tags

Another standard, which can be more convenient for page-by-page use, involves adding a <META> tag to an HTML page to tell robots not to index the page. This standard is described at http://www.robotstxt.org/wc/exclusion.html#meta.

To prevent all robots from indexing a page on your site, you'd place the following meta tag into the <HEAD> section of your page:

<META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW">

To allow other robots to index the page on your site, preventing only Search Engine's robots from indexing the page, you'd use the following tag:

<META NAME="GOOGLEBOT" CONTENT="NOINDEX, NOFOLLOW">

To allow robots to index the page on your site but instruct them not to follow outgoing links, you'd use the following tag:

<META NAME="ROBOTS" CONTENT="NOFOLLOW">

for further reference

https://www.elegantthemes.com/blog/tips-tricks/how-to-create-and-configure-your-robots-txt-file

Upvotes: 0

Related Questions