Reputation: 63299
I have a program that accepts HTTP post of files and write all the POST result into a file, I want to write a script to delete the HTTP headers, only leave the binary file data, how to do it?
The file content is below (the data between Content-Type: application/octet-stream
and ------------KM7cH2GI3cH2Ef1Ij5gL6GI3Ij5GI3
is what I want:
POST /?user_name=vvvvvvvv&size=837&file_name=logo.gif& HTTP/1.1^M
Accept: text/*^M
Content-Type: multipart/form-data; boundary=----------KM7cH2GI3cH2Ef1Ij5gL6GI3Ij5GI3^M
User-Agent: Shockwave Flash^M
Host: 192.168.0.198:9998^M
Content-Length: 1251^M
Connection: Keep-Alive^M
Cache-Control: no-cache^M
Cookie: cb_fullname=ddddddd; cb_user_name=cdc^M
^M
------------KM7cH2GI3cH2Ef1Ij5gL6GI3Ij5GI3^M
Content-Disposition: form-data; name="Filename"^M
^M
logo.gif^M
------------KM7cH2GI3cH2Ef1Ij5gL6GI3Ij5GI3^M
Content-Disposition: form-data; name="Filedata"; filename="logo.gif"^M
Content-Type: application/octet-stream^M
^M
GIF89an^@I^^M
------------KM7cH2GI3cH2Ef1Ij5gL6GI3Ij5GI3^M
Content-Disposition: form-data; name="Upload"^M
^M
Submit Query^M
------------KM7cH2GI3cH2Ef1Ij5gL6GI3Ij5GI3-
Upvotes: 3
Views: 5898
Reputation: 28588
This probably contains some typos or something, but bear with me anyway. First determine the boundary (input
is the file containing the data - pipe if necessary):
boundary=`grep '^Content-Type: multipart/form-data; boundary=' input|sed 's/.*boundary=//'`
Then filter the Filedata
part:
fd='Content-Disposition: form-data; name="Filedata"'
sed -n "/$fd/,/$boundary/p"
The last part is filter a few extra lines - header lines before and including the empty line and the boundary itself, so change the last line from previous to:
sed -n "/$fd/,/$boundary/p" | sed '1,/^$/d' | sed '$d'
sed -n "/$fd/,/$boundary/p"
filters the lines between the Filedata
header and the boundary (inclusive),sed '1,/^$/d'
is deleting everything up to and including the first line (so removes the headers) and sed '$d'
removes the last line (the boundary).After this, you wait for Dennis (see comments) to optimize it and you get this:
sed "1,/$fd/d;/^$/d;/$boundary/,$d"
Now that you've come here, scratch all this and do what Ignacio suggested. Reason - this probably won't work (reliably) for this, as GIF is binary data.
Ah, it was a good exercise! Anyway, for the lovers of sed
, here's the excellent page:
Outstanding information.
Upvotes: 0
Reputation: 798526
If you use Python, email.parser.Parser
will allow you to parse a multipart MIME document.
Upvotes: 2
Reputation: 80384
Look at the Mime::Tools suite for Perl. It has a rich set of classes; I’m sure you could put something together in just a few lines.
Upvotes: 1
Reputation: 107040
You want to do this as the file is going over, or is this something you want to do after the file comes over?
Almost any scripting language should work. My AWK is a bit rusty, but...
awk '/^Content-Type: application\/octet-stream/,/^--------/'
That should print everything between application/octet-stream
and the ----------
lines. It might also include both those lines too which means you'll have to do something a bit more complex:
BEGIN {state = 0}
{
if ($0 ~ /^------------/) {
state = 0;
}
if (state == 1) {
print $0
}
if ($0 ~ /^Content-Type: application\/octet-stream/) {
state = 1;
}
}
The application\/octet-stream
line is after the print statement because you want to set state
to 1
after you see application/octet-stream
.
Of course, being Unix, you could pipe the output of your program through awk and then save the file.
Upvotes: 2
Reputation: 2831
This may be a crazy idea, but I would try stripping the headers with procmail.
Upvotes: 1