user1031743
user1031743

Reputation: 329

are crawlers able to get all directories on webpage?

I want to backup some files on root of my webpage, something like /www/mysite/myfolder/myfile.xls Are crawlers able to find the directory? Even it is not used for files that are necessary for website? Thank you

Upvotes: 0

Views: 1472

Answers (2)

Ahmad
Ahmad

Reputation: 66

Even without a link, simple file names simply can be brute forced by the help of dictionaries. There are some tools for such attacks like Dirbuster.

Upvotes: 1

aufziehvogel
aufziehvogel

Reputation: 7297

A webcrawler without brute-force or dictionary trials (explained later) is able to find a file, if there exists at least one link to the file (on a page the crawler has visited before).

From the path /www/myfolder/myfile.xls I assume there might be even another problem. A webcrawler can only find files that are publicly available. Sometimes not all files under /www, /var/www, /htdocs or whatever is being used are publicly available. There might be structures like /www/mysite/public, where only public is available from the web. With such a structure one could make sure, that files in /www/mysite cannot be accessed without permission checks by PHP before the download.

So you have to check if

  1. your directory can be accessed via HTTP/FTP or whatever
  2. there exists a link to your file on another webpage the crawler can find (technically there must be one start page for the crawler of course)

Exception: brute-force trials

There is an exception when also files without a link can be found: Search engines could try to find files by extending the already known URL-space of a domain by known words or random words. This of course can only be done sporadically. Consider a TinyURL generator. Usually these consist of a short known URL and some random characters. These short character sequences could be tried out by a search engine hoping to find files in the so called deep web. E.g. it's possible nobody has ever written the link http://example.com/f8fwy down anywhere, nontheless it could link to a real domain (if you are lucky some website or file that has never been linked to either).

However, with search engines offering mail providers (Google) or chat engines (Microsoft, Skype), I think this technique has become less important, because they could try to gain deep web links by these services.

Upvotes: 2

Related Questions