How to separate one .jpeg file into several original ones?

So, I was messing around with bash, and accidentaly merged several .jpg files that had .txt extension into one .jpg. The stupidity I wrote was

mv ./*.txt .jpg

Any ideas on how I could undo it? the file created has a size that's compatible with the size of the other files all merged so I guess it's all there, but can't figure how to separate the binary...

(yes, why on earth did I do it? who the hell knows)

Thanks for your help.

Upvotes: 0

Views: 858

Answers (1)

Mark Setchell
Mark Setchell

Reputation: 207748

JPEG files start with d8ff as a JPEG marker. So, hex dump your file and grep for d8ff:

 od -x yourTextfile.txt | grep d8ff

Then get the byte offsets, and use dd to extract the files. So, if a JPEG marker is seen at address 2048, do this:

dd if=yourTextFile bs=1 iseek=2048 > file1.jpg

You will need to set the blocksize to 1 (bs=1) to make the offset work in bytes rather than 512 byte blocks. Some dd versions prefer skip over iseek.

Note: You will get extraneous junk at the end of the JPEG file, but most image editors will ignore it.

Updated

Here is a slightly prettier way to recover the files in Perl...

#!/usr/bin/perl
use strict;
use warnings;

my $contents;
my $i=0;

# Slurp entire file
{
  local $/ = undef;
  open FILE, "file.txt" or die "Couldn't open file: $!";
  $contents = <FILE>;
  close FILE;
}
while((my $offset=index($contents,"\xff\xd8"))!=-1){
    my $filename=sprintf("recovered_%d.jpg",$i++);
    printf "Recovering at offset: $offset into file: $filename\n";
    open(OUTFILE,">$filename");
    print OUTFILE substr($contents,$offset,10000000);
    close(OUTFILE);
    substr($contents,$offset,2)="\x00\x00";   # Overwrite this header so we find another next time
}

Uglier shell version:

You can code it something like this - quick and dirty code warning!!!

#!/bin/bash
file=file.txt       # Edit with the name of your text file with all JPEGs in it
n=0
od -Ad -b "$file" | \
   awk '/377 330 377 340/ {for(i=2;i<NF-4;i++)
                              if($i=="377" && $(i+1)=="330" && $(i+2)=="377" && $(i+3)=="340")
                                print $1+i-2
                          }' | \
   while read offset; do
      name="recovered_${n}.jpg"
      echo Possible JPEG at offset $offset - recovering following 10MB as $name
      dd if="$file" of="$name" bs=1 iseek=$offset count=10000000
      ((n++))
   done

If you are unlucky, an image's header may span across two lines of the output of od and in that case the 377 330 377 340 will not all be on the same line and the script will not find that image. Rather than code around this, I would suggest you run the script once and copy all the recovered files to another directory (to save them) then run it again, but after adding 4 or more bytes to the start of the file in order to force the entire JPEG signature onto the next line like this:

(echo spacer; cat file.txt ) > file2.txt

Then run the script again, but on file2.txt this time.

Upvotes: 4

Related Questions