Reputation: 49
I want to find out the file format by checking the content type of a header request. I have a download link of a csv file. If I check the content type it says text/html and if I try to run status_code I get an error 404.
If I run it in the terminal with curl -v I get the right content type though. Does someone know how to fix that in python? Thank you in advance!
Here is what I did so far:
import requests
url = "https://stats.oecd.org/sdmx-json/data/DP_LIVE/.GDP.../OECD?contentType=csv&detail=code&separator=comma&csv-lang=en"
r = requests.head(url)
headerDict=r.headers
print(headerDict)
and I get back:
{'Cache-Control': 'private', 'Content-Length': '4941', 'Content-Type': 'text/html; charset=utf-8', 'Access-Control-Allow-Origin': '*', 'Access-Control-Allow-Headers': 'Content-Type', 'Access-Control-Allow-Methods': 'POST,GET,OPTIONS', 'Date': 'Fri, 22 Oct 2021 08:54:31 GMT', 'Connection': 'close'}
Upvotes: 0
Views: 520
Reputation: 614
You can use this to get the content type
import requests
import mimetypes
response = requests.get('https://stats.oecd.org/sdmx-json/data/DP_LIVE/.GDP.../OECD?contentType=csv&detail=code&separator=comma&csv-lang=en')
content_type = response.headers['content-type']
extension = mimetypes.guess_extension(content_type)
print(content_type)
print(extension)
output
text/csv
.csv
Upvotes: 0