Reputation: 2222
My application makes numerous HTTP requests. Without writing a regular expression, how do I parse Content-Type
header values? For example:
text/html; charset=UTF-8
For context, here is my code for getting stuff in the Internet:
from requests import head
foo = head("http://www.example.com")
The output I am expecting is similar to what the methods do in mimetools. For example:
x = magic("text/html; charset=UTF-8")
Will output:
x.getparam('charset') # UTF-8
x.getmaintype() # text
x.getsubtype() # HTML
Upvotes: 20
Views: 13515
Reputation: 1067
Building on the answer from @philip-couling we can create just a ContentTypeHeader
object and get the MIME or Media Type as well as the parameters
def parse_content_type(content_type):
from email.policy import EmailPolicy
header = EmailPolicy.header_factory('content-type', content_type)
return (header.content_type, dict(header.params))
Upvotes: 2
Reputation: 14893
Python has this builtin. It's in the email
module.
MIME and mime types are an email standard which have been adopted in other contexts: "Multipurpose Internet Mail Extensions" (see RFC 2045).
The simplest way to do this reliably is to use an email parser:
from email.message import Message
_CONTENT_TYPE = "content-type"
def parse_content_type(content_type: str) -> tuple[str, dict[str,str]]:
email = Message()
email[_CONTENT_TYPE] = content_type
params = email.get_params()
# The first param is the mime-type the later ones are the attributes like "charset"
return params[0][0], dict(params[1:])
Upvotes: 9
Reputation: 7855
requests
doesn't give you an interface to parse the content type, unfortunately, and the standard library on this stuff is a bit of a mess. So I see two options:
Option 1: Go use the python-mimeparse third-party library.
Option 2: To separate the mime type from options like charset
, you can use the same technique that requests
uses to parse type/encoding internally: use cgi.parse_header
.
response = requests.head('http://example.com')
mimetype, options = cgi.parse_header(response.headers['Content-Type'])
The rest should be simple enough to handle with a split
:
maintype, subtype = mimetype.split('/')
Update: As of Mar 2023, the current official way of doing this, now that cgi
is deprecated, is using email.message.Message
. See Philip Couling's answer. I agree with Philip that it's kind of gross.
Upvotes: 20
Reputation: 203
Since requests 2.19.0, there is a requests.utils._parse_content_type_header
function that splits a Content-Type
header into a parameter-less content-type and a dictionary of parameters. This function does not split the content-type into main type and sub-type.
>>> requests.utils._parse_content_type_header("text/html; charset=UTF-8")
('text/html', {'charset': 'UTF-8'})
Note that the name of this function starts with an underscore: it’s supposed to be a private function, so I guess it might be dropped in a future release. For the record, a request to make it a public interface was rejected: https://github.com/psf/requests/issues/6362
Upvotes: 3
Reputation: 753
Your question is bit unclear. I assume that you are using some sort of web application framework such as Django or Flask?
Here is example how to read Content-Type using Flask:
from flask import Flask, request
app = Flask(__name__)
@app.route("/")
def test():
request.headers.get('Content-Type')
if __name__ == "__main__":
app.run()
Upvotes: -1