Reputation: 1199
I want to create a simple Python based utility which can get all the download URLs from a web page and download the content. I found several methods of doing this and the best I found was "urllib". But unfortunately the problem with me is I can't save those with the proper extensions as the URLs look like below,
http://example.com/2w3xa75
But the content can be in different formats i.e. .mp3, .ogg etc.
How can I identify the type and save these contents with the correct extension.
Upvotes: 0
Views: 99
Reputation: 10431
You can use requests
and mimetypes
,
The idea is to extract Content-Type
HTTP header and ask to mimetypes
to guess related extension.
I will use this question's URL as example (it doesn't provide extension)
import requests
import mimetypes
query = requests.get('https://stackoverflow.com/questions/45488909/retrieve-files-from-urls-and-save-those-with-correct-extension')
content_type = query.headers['Content-Type']
print(mimetypes.guess_extension(content_type.split(';')[0]))
Output:
.html
A Content-Type
header is like: 'text/html; charset=utf-8'
, but only the first part (text/html
) is expected by mimetypes.guess_extension
, that's why I splitted it.
Upvotes: 1