BSileNTz
BSileNTz

Reputation: 31

Perl: How to user regex for replacement in file?

I am working on a script which will read this file containing measurements in Angstroms and will convert them into nm(1 angstrom = 0.1 nm).

Following are some examples, it should find and replace:

3A

12 A

2.75 angstroms

0.123 Angstroms

It should not alter the following examples: I like the number 3. A very nice number.There are 27 Aardvarks in London Zoo.

This is what i got so far. There are 2 things i have problem with, How i can perform that "devide by 10" after a match is found and write it back to the file? I just don't have any idea how the regex should look like for this question.

use strict;
use warnings;

my $filename = 'angstrom.txt';   

open(FILE, $filename) or die "Can't open $filename: $!";
my @lines = <FILE>;
close(FILE);

open(FILE, ">$filename") or die "Can't write to $filename: $!";
foreach my $line (@lines) {
    if($line =~ s/\d{2}\w//e)
    {   
        print FILE (@lines); 
    }
}
close(FILE);

Upvotes: 1

Views: 86

Answers (1)

Sobrique
Sobrique

Reputation: 53508

The problem with regular expressions is - they aren't all that good at 'understanding' a numeric value. They're about text.

You can do it in this specific case, because you're dividing by 10, but I wouldn't normally call it a good idea.

So instead - extract the value to change, and apply a multiplication to it:

s|([\d\.]+) angstroms|$1 / 10 . " nm"|eig;

This will capture the 'digit+decimals' preceeding the word 'angstrom', divide by 10, and then add in 'nm' instead.

  • The i flag makes the match case insensitive.
  • The e flag says to 'evaluate' the replacement as perl.
  • The g to do it "globally" per line - this may be irrelevant based on your sample data.

Note - we also use | instead of the more common / delimiter, because we use / in the expression. (You could escape it, but I think this is clearer)

So to do this to your file - we can make use of a perlrun flag -i - inplace edit. (Specify an extension after - it renames source to that extension, and then rewrites the file in place)

perl -i.bak -pe 's|([\d\.]+) angstroms|$1 / 10 . " nm"|eig' angstrom.txt

Or you could splice the above into your code.

I would normally suggest avoiding a 'read-write' operation like that, because it does mean that a code glitch means you lose your source data. Open a new output file, and write to it - and then rename it once you're finished (successfully) is a better practice.

(It also consumes memory proportionate to your source file. This is often a non-issue, but can sometimes become relevant).

Given your code needs to match A, Angstrom or Angstroms (I assume you have no 'amps' to worry about?)

perl -i.bak -pe 's|([\d\.]+)\s*a(?:ngstroms)?\b|$1 / 10 . " nm"|ei'  angstrom.txt

This goes the extra step of matching a, A, angstroms or Anstroms, and we have \b to require a word break immediately after. So "12 Apples" won't catch us out.

Perhaps ironically - -i.bak -pe actually is probably easier than writing it longhand. However if you were to want to:

#!/usr/bin/perl
use strict;
use warnings;

my $filename = 'angstrom.txt';   

open(my $input, '<', $filename) or die "Can't open $filename: $!";
open(my $output, '>', $filename.".new" ) or die $!; 

select $output; 
while ( <$input> ) {
    s|([\d\.]+)\s*a(?:ngstroms)?\b|$1 / 10 . " nm"|eig;
    print;
}
close ( $input );
close ( $output );

#rename .new here

Upvotes: 1

Related Questions