Python header request wrong content type/Error 404

Question

I want to find out the file format by checking the content type of a header request. I have a download link of a csv file. If I check the content type it says text/html and if I try to run status_code I get an error 404.

If I run it in the terminal with curl -v I get the right content type though. Does someone know how to fix that in python? Thank you in advance!

Here is what I did so far:

import requests
url = "https://stats.oecd.org/sdmx-json/data/DP_LIVE/.GDP.../OECD?contentType=csv&detail=code&separator=comma&csv-lang=en"
    
r = requests.head(url)
headerDict=r.headers
print(headerDict)

and I get back:

{'Cache-Control': 'private', 'Content-Length': '4941', 'Content-Type': 'text/html; charset=utf-8', 'Access-Control-Allow-Origin': '*', 'Access-Control-Allow-Headers': 'Content-Type', 'Access-Control-Allow-Methods': 'POST,GET,OPTIONS', 'Date': 'Fri, 22 Oct 2021 08:54:31 GMT', 'Connection': 'close'}

Shreyas Prakash · Accepted Answer

You can use this to get the content type

import requests
import mimetypes

response = requests.get('https://stats.oecd.org/sdmx-json/data/DP_LIVE/.GDP.../OECD?contentType=csv&detail=code&separator=comma&csv-lang=en')
content_type = response.headers['content-type']
extension = mimetypes.guess_extension(content_type)
print(content_type)
print(extension)

output

text/csv
.csv

Python header request wrong content type/Error 404

Answers (1)

Related Questions