No Access to Top Directory, Want to Stop Certain Robots

Question

I have an essay I want to release under an open licence so that others can use it, but I don't want it to be read by turnitin (google if you don't know.)

I want to host it in my university's public_html directory, so I don't have access to the top directory's robots.txt.

An answer to this problem will resolve how to stop turnitin from reading the page, but allow humans and search engine spiders from finding, reading and indexing it.

gcbound · Accepted Answer

The TurnitinBot general information page at:

https://turnitin.com/robot/crawlerinfo.html

describes how their plagiarism prevention service crawls Internet content

The section:

https://turnitin.com/robot/crawlerinfo.html#access

describes how robots.txt can be configured to prevent TurnitinBot crawling by adding a line for their user agent:

    User-agent: TurnitinBot
    Disallow: ...your document...

Because you don't have access to the robots.txt file, if you can expose your essay in HTML format, you could try including a meta tag in the document like:

(If you don't expose in HTML and it's important enough, could you?)

Their crawlerinfo page above says this about "good crawling etiquette":

It should also obey META exclusion tags within pages.

and hopefully they follow the good etiquette they provide on their own page.

No Access to Top Directory, Want to Stop Certain Robots

Answers (1)

Related Questions