Reputation: 147206
I would like to know which pattern can I use in sed to make changes in the first line of huge files (~2 GB). The preference for sed is only because I assume it must be faster than a Python or Perl script.
The files have the following structure:
field 1, field 2, ... field n
data
and, given the likelihood of having spaces in the identifier for every field, I need to replace every space by an underscore in this way:
**BEFORE**
the first name,the second name,the first surname,a nickname, ...
data
**AFTER**
the_first_name,the_second_name,the_first_surname,a_nickname, ...
data
Any pointers to the right pattern to use, or another scripting solution would be great.
Upvotes: 10
Views: 9257
Reputation: 16786
I don't think you want to use any solution that requires the data to be written to a new file.
If you're pretty sure that all you need is to change the spaces into underscores in the first line of the large text files, you only have to read the first line, swap the characters and write it back in place:
#!/usr/bin/env perl
use strict;
my $filename = shift;
open (FH, "+< $filename") || die "can't open $filename: $!";
my $line = <FH>;
$line =~ s/ /_/g;
seek FH, 0, 0; # go back to the start of the file
printf FH $line;
close FH;
To use it, just pass the full path of the file to update:
# fixheader "/path/to/myfile.txt"
Upvotes: 10
Reputation: 30225
To edit the first 10 lines
sed -i -e '1,10s/ /_/g'
In Perl, you can use the flip-flop operator in scalar context:
perl -i -pe 's/ /_/g if 1 .. 10'
Upvotes: 24
Reputation: 96827
This could be a solution :
use Tie::File;
tie my @array,"Tie::File","path_to_file";
$array[0] = "new text";
untie @array;
Tie::File is one of the modules I use the most , and it's very simple to use . Each element in the array is a line in the file . One of the downsides , however , would be that this loads the whole file in memory .
Upvotes: -1
Reputation: 62593
the change you mention (replacing every space by an underscore) doesn't change the line's length, so in theory it could be done inplace.
warning!: untested!
head -n 1 yourfile | sed -e 's/ /_/g' > tmpfile
dd conv=nocreat,notrunc if=tmpfile of=yourfile
i'm not so sure about the conv=...
parameters, but it seems that it should make dd
overwrite the start of the original file with the transformed line.
please note that if you want to do any other transformation, which could alter the line's length, do not, do not do this. you'd have to do a full copy. something like this:
head -n 1 yourfile | sed -e 's/ /_/g' > tmpfile
tail -n + 2 | cat tmpfile - > transformedfile
Upvotes: 4
Reputation: 42674
You are unlikely to notice any speed difference between Perl, Python, and sed. Your script will spend most of its time waiting for IO.
If the lines are the same length, you can edit in-place, otherwise you will have to create a new file.
In Perl:
#!/usr/bin/env perl
use strict;
my $filename = shift;
open my $in_fh, '<', $filename
or die "Cannot open $filename for reading: $!";
my $first_line = <$in_fh>;
open my $out_fh, '>', "$filename.tmp"
or die "Cannot open $filename.tmp for writing: $!";
$first_line =~ s/some translation/goes here/;
print {$out_fh} $first_line;
print {$out_fh} $_ while <$in_fh>; # sysread/syswrite is probably better
close $in_fh;
close $out_fh;
# overwrite original with modified copy
rename "$filename.tmp", $filename
or warn "Failed to move $filename.tmp to $filename: $!";
Upvotes: 5