How do I know what to name a file downloaded using HTTP?

Question

I am creating an HTTP client downloader in Python. I am able to correctly download a file such as http://www.google.com/images/srpr/logo11w.png just fine. However, I'm not sure what to actually name the thing.

There is of course the filename at the end of the URL, but is this always reliable?

RomanK · Accepted Answer

If I recall correctly, wget uses the following heuristic:

If a Content-Disposition header exists, get the filename from there.
If the filename component of the URL exists (e.g. http://myserver/filename), use that.
If there is no filename component (e.g. http://www.google.com), derive the filename from the Content-Type header (such as index.html for text/html)
In all cases, if this filename is already present in the directory use a numerical suffix, such as index (1).html, or overwrite, depending on configuration.

There are plenty of other flags that control other heuristics, such as creating .html for ASP/DHTML content-types.

In short, it really depends how far you want to go. For most people, doing the first two + basic Content-Type->name mapping should be enough.

How do I know what to name a file downloaded using HTTP?

Answers (1)

Related Questions