Reputation: 3619
I was attempting to do a sed
replacement in a binary file however I am beginning to believe that is not possible. Essentially what I wanted to do was similar to the following:
sed -bi "s/\(\xFF\xD8[[:xdigit:]]\{1,\}\xFF\xD9\)/\1/" file.jpg
The logic I wish to achieve is: scan through a binary file until the hex code FFD8
, continue reading until FFD9
, and only save what was between them (discards the junk before and after, but include FFD8
and FFD9
as the saved part of the file)
Is there a good way to do this? Even if not using sed
?
EDIT: I just was playing around and found the cleanest way to do it IMO. I am aware that this grep statement will act greedy.
hexdump -ve '1/1 "%.2x"' dirty.jpg | grep -o "ffd8.*ffd9" | xxd -r -p > clean.jpg
Upvotes: 35
Views: 48064
Reputation: 133
Old question, but,
xxd infile | sed 's/xxxx xxxx/yyyy yyyy/' | xxd -r > outfile
is probably the simplest and most reliable solution. Similar to the edit in the OP.
Upvotes: 9
Reputation: 7407
bbe is a "sed for binary files", and should work more efficiently for large binary files than hexdumping/reconstructing.
An example of its use:
$ bbe -e 's/original/replaced/' infile > outfile
Further information on the man page.
Upvotes: 53
Reputation: 129549
Also, this Perl might work (not tested, caveat emptor)... if Python is not installed :)
open(FILE, "file.jpg") || die "no open $!\n";
while (read(FILE, $buff, 8 * 2**10)) {
$content .= $buff;
}
@matches = ($content =~ /(\xFF\xD8[:xdigit:]+?\xFF\xD9)/g;
print STDOUT join("", @matches);
You need to add binmode(FILE); binmode(STDOUT);
on DOS or VMS after the open()
call - not needed on Unix.
Upvotes: 1
Reputation: 343135
Is there a good way to do this
yes of course, use an image editing tool such as those from ImageMagick (search the net for linux jpeg , exif editor etc) that knows how to edit jpg metadata. I am sure you can find one tool that suits you. Don't try to do this the hard way. :)
Upvotes: 2
Reputation: 400652
sed might be able to do it, but it could be tricky. Here's a Python script that does the same thing (note that it edits the file in-place, which is what I assume you want to do based on your sed script):
import re
f = open('file.jpeg', 'rb+')
data = f.read()
match = re.search('(\xff\xd8[0-9A-fa-f]+)\xff\xd9', data)
if match:
result = match.group(1)
f.seek(0)
f.write(result)
f.truncate()
else:
print 'No match'
f.close()
Upvotes: 2