awk split on hex string

Question

I have a file with multiple jpegs inside. So i would like to split them to single jpegs.

The easy part is to find the beginning: 0xFF0xD8 0xFF0xE1 marks the beginning of the JPG and the EXIF Data field, which is in my case always at the beginning.

So I found this awk command:

awk '/string/{n++}{print >"out" n ".txt" }' final.txt

To split the files. Which does not work as expected when I use it with hex:

awk '/0xFF0xD8 0xFF0xE1/{n++}{print >"out" n ".txt" }' final.txt

The doc of awk says that all strings with 0x in front are used as hex but I seems not working well..

Edit: well i found this: https://superuser.com/questions/174362/how-to-split-binary-file-based-on-pattern but it does not work for me... it should create 2 files, but only one is created and its only 11 Bytes big

hfs · Accepted Answer

Are you sure awk handles binary files well? I thought it would expect newlines.

Perl can use hex escapes in regexes (Basic idea from this answer):

#!/usr/bin/perl
undef $/;
$_ = <>;
$n = 0;
for $content (split(/(?=\xFF\xD8\xFF\xE0)/)) {
        open(OUT, ">out" . ++$n . ".txt");
        print OUT $content;
        close(OUT);
}

awk split on hex string

Answers (2)

Related Questions