Reputation: 473
I am writing Web Service Client, using requests library. I am getting data in multipart/form-data that contains a file and text-json. I have no idea how to parse it. Is there a proper library to parse multipart/form-data format in python or should I write parser on my own?
my code:
data = {
"prototypeModel" :('prototypeModel', open(prototypeModel, 'rb'), 'application/octet-stream', {'Expires': '0'}),
"mfcc_1" : ('mfcc', open(mfcc_1, 'rb'), 'application/octet-stream', {'Expires': '0'}),
"mfcc_2" : ('mfcc', open(mfcc_2, 'rb'), 'application/octet-stream', {'Expires': '0'}),
"mfcc_3" : ('mfcc', open(mfcc_3, 'rb'), 'application/octet-stream', {'Expires': '0'}),
}
print( '---------------------- start enroll ----------------------')
testEnrollResponse = requests.post(server+sessionID, files = data, json = declaredParameters)
b'\r\n--c00750d1-8ce4-4d29-8390-b50bf02a92cc\r\nContent-Disposition: form-data; name="playbackHash"\r\nContent-Type: application/octet-stream\r\n\r\n\x16\x00\x00\x00\x00\x00\x00\x00serialization::archive\n\x00\x04\x08\x04 .... x00\x00R\x94\x9bp\x8c\x00\r\n--c00750d1-8ce4-4d29-8390-b50bf02a92cc\r\nContent-Disposition: form-data; name="usersMFCC"\r\nContent-Type: application/octet-stream\r\n\r\n\x16\x00\x00\x00\x00\x00\x00\x00serialization::archive\n\x00\x04\x08\x04\x08\x01\x00\x00\x00\x00\x00\x00\x00\x00\xf8\x16\x00\x00\x00\x00\x00\x00u\xbd\xb4/\xda1\xea\xbf\x0f\xed\xa2<\xc9\xf8\xe7\xbf?\xd5\xf06u\xe7\xf0\xbf\xd4\x8d\xd4\xa1F\xbe\x03@\x85X!\x19\xd8A\x06@\x8co\xf7\r .....
x80\xd9\x95Yxn\xd0?\r\n--c00750d1-8ce4-4d29-8390-b50bf02a92cc\r\nContent-Disposition: form-data; name="scoreAndStatus"\r\nContent-Type: application/json; charset=utf-8\r\n\r\n{"lexLikelihood":1.544479046897232,"overallScore":-nan,"playbackLikelihood":-inf,"status":{"errorCode":0,"errorMessage":""}}\r\n--c00750d1-8ce4-4d29-8390-b50bf02a92cc--\r\n'
I replaced more binary data with " ..... "
Upvotes: 24
Views: 60817
Reputation: 1494
Code sample for Flask, uses https://github.com/defnull/multipart
import multipart as mp
from multipart import to_bytes
try:
from io import BytesIO
except ImportError:
from StringIO import StringIO as BytesIO
@app.route('/', methods=["GET","POST"])
def index():
...
elif flask.request.method == "POST":
data = flask.request.data
s = data.split("\r")[0][2:]
p = mp.MultipartParser(BytesIO(to_bytes(data)),s)
blob = p.parts()[0].value
f = open("file.bin","wb")
f.write(blob.encode("latin-1"))
f.close()
Upvotes: 10
Reputation: 713
As far as I know, the easiest way is to use requests-toolbelt
as suggested originally by @Ian Stapleton Cordasco
Here is my full example to deal with this task:
import requests
payload = f"""<soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/" xmlns:aa="http://someservice.com/">
<soapenv:Header/>
<soapenv:Body>
<aa_GetFile>
<FileID>123</FileID>
</aa_GetFile>
</soapenv:Body>
</soapenv:Envelope>"""
headers = {"Content-Type": "text/xml; charset=utf-8"}
response = requests.post("http://localhost:8080/Service?WSDL", data=payload, headers=headers, verify=False)
from requests_toolbelt.multipart import decoder
multipart_data = decoder.MultipartDecoder.from_response(response)
file_content: bytes = b""
for part in multipart_data.parts:
is_file_part: bool = False
for key, value in part.headers.items():
if key.decode("utf8") == "Content-Type":
if value.decode("utf8") == "application/octet-stream":
is_file_part = True
if is_file_part:
file_content += part.content
print("file_content = " + str(file_content))
And the result is:
file_content = b'abcdef'
which is the binary content of the file downloaded from the service. :)
Upvotes: 1
Reputation: 376
A working example of parsing multipart data follows. You can try it out at the interactive python prompt.
import email
msg = email.message_from_string('''\
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary=" XXXX"
-- XXXX
Content-Type: text/plain
-- XXXX
Content-Type: text/plain
-- XXXX--
''')
msg.is_multipart()
Once you know its working on your system, you can build your own email message out of the POST data and parse it the same way. If you have the raw post body as a string the rest of the necessary information can be found in the request headers. I added indentation here for clarity, you should not have extraneous indentation in the block string.
epost_data = '''\
MIME-Version: 1.0
Content-Type: %s
%s''' % (self.headers['content-type'], post_data)
msg = email.message_from_string(post_data)
if msg.is_multipart():
for part in msg.get_payload():
name = part.get_param('name', header='content-disposition')
filename = part.get_param('filename', header='content-disposition')
# print 'name %s' % name # "always" there
# print 'filename %s' % filename # only there for files...
payload = part.get_payload(decode=True)
print payload[:100] # output first 100 characters
The first %s
will be replaced with the content type, and the second with post_data
. You can then write the payload to a file, etc.
Be careful to consider security implications of saving a file. You may not be able to trust the file name posted, it could start with ../../filename.sh
for example on some web servers, so if you try to write /my-folder/../../filename.sh
the attacker could potentially place a malicious file outside of the location where you are trying to store files. Strong validation of the file being the allowed type before trusting the file itself is also recommended. You do not want to let attackers overwrite any file on your system.
Upvotes: 9
Reputation: 28757
If you're receiving a multipart/form-data
response, you can parse it using the requests-toolbelt
library like so:
$ pip install requests-toolbelt
After installing it
from requests_toolbelt.multipart import decoder
testEnrollResponse = requests.post(...)
multipart_data = decoder.MultipartDecoder.from_response(testEnrollResponse)
for part in multipart_data.parts:
print(part.content) # Alternatively, part.text if you want unicode
print(part.headers)
Upvotes: 30