NIK
NIK

Reputation: 1199

Retrieve files from URLs and save those with correct extension

I want to create a simple Python based utility which can get all the download URLs from a web page and download the content. I found several methods of doing this and the best I found was "urllib". But unfortunately the problem with me is I can't save those with the proper extensions as the URLs look like below,

http://example.com/2w3xa75

But the content can be in different formats i.e. .mp3, .ogg etc.

How can I identify the type and save these contents with the correct extension.

Upvotes: 0

Views: 99

Answers (1)

Arount
Arount

Reputation: 10431

You can use requests and mimetypes,

The idea is to extract Content-Type HTTP header and ask to mimetypes to guess related extension.

I will use this question's URL as example (it doesn't provide extension)

import requests
import mimetypes

query = requests.get('https://stackoverflow.com/questions/45488909/retrieve-files-from-urls-and-save-those-with-correct-extension')
content_type = query.headers['Content-Type']

print(mimetypes.guess_extension(content_type.split(';')[0]))

Output:

.html

A Content-Type header is like: 'text/html; charset=utf-8', but only the first part (text/html) is expected by mimetypes.guess_extension, that's why I splitted it.

Upvotes: 1

Related Questions