user1277170
user1277170

Reputation: 3227

No Access to Top Directory, Want to Stop Certain Robots

I have an essay I want to release under an open licence so that others can use it, but I don't want it to be read by turnitin (google if you don't know.)

I want to host it in my university's public_html directory, so I don't have access to the top directory's robots.txt.

An answer to this problem will resolve how to stop turnitin from reading the page, but allow humans and search engine spiders from finding, reading and indexing it.

Upvotes: 2

Views: 135

Answers (1)

gcbound
gcbound

Reputation: 750

The TurnitinBot general information page at:

https://turnitin.com/robot/crawlerinfo.html

describes how their plagiarism prevention service crawls Internet content

The section:

https://turnitin.com/robot/crawlerinfo.html#access

describes how robots.txt can be configured to prevent TurnitinBot crawling by adding a line for their user agent:

    User-agent: TurnitinBot
    Disallow: ...your document...

Because you don't have access to the robots.txt file, if you can expose your essay in HTML format, you could try including a meta tag in the document like:

    <meta name="TurnitinBot" content="noindex" />

(If you don't expose in HTML and it's important enough, could you?)

Their crawlerinfo page above says this about "good crawling etiquette":

It should also obey META exclusion tags within pages.

and hopefully they follow the good etiquette they provide on their own page.

Upvotes: 1

Related Questions