Reputation: 329
I want to backup some files on root of my webpage, something like /www/mysite/myfolder/myfile.xls Are crawlers able to find the directory? Even it is not used for files that are necessary for website? Thank you
Upvotes: 0
Views: 1472
Reputation: 66
Even without a link, simple file names simply can be brute forced by the help of dictionaries. There are some tools for such attacks like Dirbuster.
Upvotes: 1
Reputation: 7297
A webcrawler without brute-force or dictionary trials (explained later) is able to find a file, if there exists at least one link to the file (on a page the crawler has visited before).
From the path /www/myfolder/myfile.xls
I assume there might be even another problem. A webcrawler can only find files that are publicly available. Sometimes not all files under /www
, /var/www
, /htdocs
or whatever is being used are publicly available. There might be structures like /www/mysite/public
, where only public
is available from the web. With such a structure one could make sure, that files in /www/mysite
cannot be accessed without permission checks by PHP before the download.
So you have to check if
There is an exception when also files without a link can be found: Search engines could try to find files by extending the already known URL-space of a domain by known words or random words. This of course can only be done sporadically. Consider a TinyURL generator. Usually these consist of a short known URL and some random characters. These short character sequences could be tried out by a search engine hoping to find files in the so called deep web. E.g. it's possible nobody has ever written the link http://example.com/f8fwy down anywhere, nontheless it could link to a real domain (if you are lucky some website or file that has never been linked to either).
However, with search engines offering mail providers (Google) or chat engines (Microsoft, Skype), I think this technique has become less important, because they could try to gain deep web links by these services.
Upvotes: 2