Reputation: 3227
I have an essay I want to release under an open licence so that others can use it, but I don't want it to be read by turnitin (google if you don't know.)
I want to host it in my university's public_html directory, so I don't have access to the top directory's robots.txt.
An answer to this problem will resolve how to stop turnitin from reading the page, but allow humans and search engine spiders from finding, reading and indexing it.
Upvotes: 2
Views: 135
Reputation: 750
The TurnitinBot general information page at:
https://turnitin.com/robot/crawlerinfo.html
describes how their plagiarism prevention service crawls Internet content
The section:
https://turnitin.com/robot/crawlerinfo.html#access
describes how robots.txt can be configured to prevent TurnitinBot crawling by adding a line for their user agent:
User-agent: TurnitinBot
Disallow: ...your document...
Because you don't have access to the robots.txt file, if you can expose your essay in HTML format, you could try including a meta tag in the document like:
<meta name="TurnitinBot" content="noindex" />
(If you don't expose in HTML and it's important enough, could you?)
Their crawlerinfo page above says this about "good crawling etiquette":
It should also obey META exclusion tags within pages.
and hopefully they follow the good etiquette they provide on their own page.
Upvotes: 1