Reputation: 232
I'm completely new to Perl and I thought that would be the best language to solve my simple task. I need to convert a binary file into something readable and need to find and replace strings like \x00\x39
into \x09
(tab) or something like that.
From bash, I started with the following and it works great:
perl -pi -e 's/abc/123/g' test.txt
However, when I start to enter ascii codes, I'm lost:
perl -pi -e 's/0x49/*/g' test.txt
perl -pi -e 's/{char(49)}/*/g' test.txt
How would this command would look like as a line in a perl script? I have about a couple hundred of these find/replacement operations and a 500MB text file. Are there any caveats that I would need to know?
Thanks so much for any help!
Gary
Upvotes: 3
Views: 9290
Reputation: 232
Wow, thank you very much. I learned that it wasn't as easy as I assumed. Wow, Perl is truly very complex ;-)
Here is, what I came up with. I hope this will help someone.
BTW: If you have any chance to know if this will also work on Windows Perl, please let me know.
Thanks again,
Gary
#!/usr/bin/perl
use strict;
use warnings;
my $infile = '/Users/gc/Desktop/a.bin';
my $outfile = '/Users/gc/Desktop/b.txt'; # in and out can be the same file; file will be overwritten when it already exists
my $data = read_file($infile);
# 1st batch
$data =~ s/0\x01J[\x00-\x19]/\x09AnythingYouWant\x09/g;
$data =~ s/0\x00[\x00-\x19]/\x09AnythingYouWant\x09/g;
# 2nd batch
$data =~ s/\r/\x06/g; # CR into \x06
$data =~ s/\n/\x06/g; # LF into \x06
$data =~ s/\r\n/\x06/g; # CR LF into \x06
# …
write_file($outfile, $data);
exit;
sub read_file {
my ($infile) = @_;
open my $in, '<', $infile or die "Could not open '$infile' for reading $!";
local $/ = undef;
my $all = <$in>;
close $in;
return $all;
}
sub write_file {
my ($outfile, $content) = @_;
open my $out, '>', $outfile or die "Could not open '$outfile' for writing $!";;
print $out $content;
close $out;
return;
}
Upvotes: 1
Reputation: 54373
Although it's a bit weird to do string replaces on a binary file, here's how to do it with your txt file:
use strict;
use warnings;
use Tie::File;
my @file;
tie @file, 'Tie::File', 'test.txt' or die $!;
foreach (@file) {
# your regexes go here
s/abc/123/g;
s/\0x49/*/g;
}
untie @file;
The Tie::File module (from the Perl core) allows you to access the lines of the file through an array. Changes will be saved to the file immediately. In the foreach
loop, the file is processed line by line. The lines go into $_
, which we cannot see. The regex operations are by default also applied to $_
, so there's no need to write it down.
However, I believe you are going about this the wrong way. In most cases, you will not be able to just read the file line by line. Refer to perlfaq as a starting point. Dealing with binary is somewhat more tricky than just text processing I'm afraid.
Upvotes: 0
Reputation: 241968
Use the \x##
notation:
perl -pi~ -e 's/\x00/*/g' test.txt
To replace each "special" character with its code in brackets, use the /e
option:
perl -pi~ -e 's/([\x0-\x09\x11-\x1f])/"[" . ord($1) . "]"/eg' test.txt
Upvotes: 7