Richard Laurant
Richard Laurant

Reputation: 657

unix: delete everything before a sequence of bytes

I have binary files and I want to delete everything before (and including) a certain sequence of bytes (five times '7e'). For example I have a file test:

hexdump test
0000000 000a 4ffa 0a0d 7e7e 7e7e 837e 646f 0110
0000010 8318 dac3                              
0000014

The result should be:

hexdump test1
0000000 6f83 1064 1801 c383 00da               
0000009

I tried it with cat test | sed 's/.*~~~~~//' however, it only deleted the '~~~~~' and keeps the rest.

Upvotes: 1

Views: 339

Answers (1)

Wintermute
Wintermute

Reputation: 44023

Using sed with binary files is not going to end well, since it does some locale- and encoding-dependent things and generally expects to work on text files. There is another utility, bbe (binary block editor), that is better suited to this task. With it, you can do this:

bbe -b ':/~~~~~/' -e 'D 1' test

This states that blocks are units that end with ~~~~~ and instructs bbe to delete the first of them (D 1).

The problem you run into with sed, discounting encoding snafu, is that sed works line by line. If you are hell-bent on doing it with sed (in which case you can expect random failures), this might work on some platforms:

sed '1,/~~~~~/ { /~~~~~/!d; s/^.*~~~~~// }' test

This will, in the pattern range 1,/~~~~~/ (from the first line to the first that contains ~~~~~) delete lines that do not contain ~~~~~ and remove the part up to ~~~~~ from the line that eventually does. This is more brittle than the bbe approach in more ways than one; apart from the encoding snafu, it will break if ~~~~~ appears twice between two 0a (newline) bytes. If this is for serious use, go with bbe.

Upvotes: 1

Related Questions