Reputation: 31
I am working on a script which will read this file containing measurements in Angstroms and will convert them into nm(1 angstrom = 0.1 nm).
Following are some examples, it should find and replace:
3A
12 A
2.75 angstroms
0.123 Angstroms
It should not alter the following examples: I like the number 3. A very nice number.There are 27 Aardvarks in London Zoo.
This is what i got so far. There are 2 things i have problem with, How i can perform that "devide by 10" after a match is found and write it back to the file? I just don't have any idea how the regex should look like for this question.
use strict;
use warnings;
my $filename = 'angstrom.txt';
open(FILE, $filename) or die "Can't open $filename: $!";
my @lines = <FILE>;
close(FILE);
open(FILE, ">$filename") or die "Can't write to $filename: $!";
foreach my $line (@lines) {
if($line =~ s/\d{2}\w//e)
{
print FILE (@lines);
}
}
close(FILE);
Upvotes: 1
Views: 86
Reputation: 53508
The problem with regular expressions is - they aren't all that good at 'understanding' a numeric value. They're about text.
You can do it in this specific case, because you're dividing by 10, but I wouldn't normally call it a good idea.
So instead - extract the value to change, and apply a multiplication to it:
s|([\d\.]+) angstroms|$1 / 10 . " nm"|eig;
This will capture the 'digit+decimals' preceeding the word 'angstrom', divide by 10, and then add in 'nm' instead.
i
flag makes the match case insensitive. e
flag says to 'evaluate' the replacement as perl. g
to do it "globally" per line - this may be irrelevant based on your sample data. Note - we also use |
instead of the more common /
delimiter, because we use /
in the expression. (You could escape it, but I think this is clearer)
So to do this to your file - we can make use of a perlrun
flag -i
- inplace edit. (Specify an extension after - it renames source to that extension, and then rewrites the file in place)
perl -i.bak -pe 's|([\d\.]+) angstroms|$1 / 10 . " nm"|eig' angstrom.txt
Or you could splice the above into your code.
I would normally suggest avoiding a 'read-write' operation like that, because it does mean that a code glitch means you lose your source data. Open a new output file, and write to it - and then rename it once you're finished (successfully) is a better practice.
(It also consumes memory proportionate to your source file. This is often a non-issue, but can sometimes become relevant).
Given your code needs to match A, Angstrom or Angstroms (I assume you have no 'amps' to worry about?)
perl -i.bak -pe 's|([\d\.]+)\s*a(?:ngstroms)?\b|$1 / 10 . " nm"|ei' angstrom.txt
This goes the extra step of matching a
, A
, angstroms
or Anstroms
, and we have \b
to require a word break immediately after. So "12 Apples" won't catch us out.
Perhaps ironically - -i.bak -pe
actually is probably easier than writing it longhand. However if you were to want to:
#!/usr/bin/perl
use strict;
use warnings;
my $filename = 'angstrom.txt';
open(my $input, '<', $filename) or die "Can't open $filename: $!";
open(my $output, '>', $filename.".new" ) or die $!;
select $output;
while ( <$input> ) {
s|([\d\.]+)\s*a(?:ngstroms)?\b|$1 / 10 . " nm"|eig;
print;
}
close ( $input );
close ( $output );
#rename .new here
Upvotes: 1