Reputation:
Hi im coding to code a Tool that searchs for Dirs and files.
have done so the tool searchs for dirs, but need help to make it search for files on websites.
Any idea how it can be in python?
Upvotes: 0
Views: 211
Reputation: 391922
If you're getting information on your own website for presentation in your own web application, you should use os.walk.
See http://www.python.org/doc/2.5.2/lib/os-file-dir.html for more information.
Upvotes: 0
Reputation: 54474
You cannot get a directory listing on a website.
Pedantically, HTTP has no notion of directory.
Pratically, WebDAV provides a directory listing verb, so you can use that if WebDAV is enabled.
Otherwise, the closest thing you can do is similar to what recursive wget does: get a page, parse the HTML, look for hyperlinks (a/@href
in xpath), filter out hyperlinks that do not point to URL below the current page, recurse into the remaining urls.
You can do further filtering, depending on your use case, such as removing the query part of the URL (anything after the first ?
).
When the server has a directory listing feature enabled, this gives you something usable. This also gives you something usable if the website has no directory listing but is organized in a sensible way.
Upvotes: 1
Reputation: 34347
You can only do this if you have permission to browse directories on the site and no default page exists.
Upvotes: 1
Reputation: 309
Is this tool scanning the directories of your own website (in which the tool is running), or external sites?
Upvotes: 1