Reputation: 35
So, I was messing around with bash, and accidentaly merged several .jpg files that had .txt extension into one .jpg. The stupidity I wrote was
mv ./*.txt .jpg
Any ideas on how I could undo it? the file created has a size that's compatible with the size of the other files all merged so I guess it's all there, but can't figure how to separate the binary...
(yes, why on earth did I do it? who the hell knows)
Thanks for your help.
Upvotes: 0
Views: 858
Reputation: 207748
JPEG files start with d8ff
as a JPEG marker. So, hex dump your file and grep for d8ff
:
od -x yourTextfile.txt | grep d8ff
Then get the byte offsets, and use dd
to extract the files. So, if a JPEG marker is seen at address 2048, do this:
dd if=yourTextFile bs=1 iseek=2048 > file1.jpg
You will need to set the blocksize to 1 (bs=1
) to make the offset work in bytes rather than 512 byte blocks. Some dd
versions prefer skip
over iseek
.
Note: You will get extraneous junk at the end of the JPEG file, but most image editors will ignore it.
Updated
Here is a slightly prettier way to recover the files in Perl...
#!/usr/bin/perl
use strict;
use warnings;
my $contents;
my $i=0;
# Slurp entire file
{
local $/ = undef;
open FILE, "file.txt" or die "Couldn't open file: $!";
$contents = <FILE>;
close FILE;
}
while((my $offset=index($contents,"\xff\xd8"))!=-1){
my $filename=sprintf("recovered_%d.jpg",$i++);
printf "Recovering at offset: $offset into file: $filename\n";
open(OUTFILE,">$filename");
print OUTFILE substr($contents,$offset,10000000);
close(OUTFILE);
substr($contents,$offset,2)="\x00\x00"; # Overwrite this header so we find another next time
}
Uglier shell version:
You can code it something like this - quick and dirty code warning!!!
#!/bin/bash
file=file.txt # Edit with the name of your text file with all JPEGs in it
n=0
od -Ad -b "$file" | \
awk '/377 330 377 340/ {for(i=2;i<NF-4;i++)
if($i=="377" && $(i+1)=="330" && $(i+2)=="377" && $(i+3)=="340")
print $1+i-2
}' | \
while read offset; do
name="recovered_${n}.jpg"
echo Possible JPEG at offset $offset - recovering following 10MB as $name
dd if="$file" of="$name" bs=1 iseek=$offset count=10000000
((n++))
done
If you are unlucky, an image's header may span across two lines of the output of od
and in that case the 377 330 377 340
will not all be on the same line and the script will not find that image. Rather than code around this, I would suggest you run the script once and copy all the recovered files to another directory (to save them) then run it again, but after adding 4 or more bytes to the start of the file in order to force the entire JPEG signature onto the next line like this:
(echo spacer; cat file.txt ) > file2.txt
Then run the script again, but on file2.txt
this time.
Upvotes: 4