Trey
Trey

Reputation: 63

How to substitute and replace number with letter in specific column

So I'm trying to figure out how to substitute the numbers in specific columns and replace that number with a letter. I'm coding in Perl


input.txt

10004226549870  
204226549870062001186000000000040008060802  
5032000067470318021604226549870062001186  
603200100001312293000522105000000456000131289  
603200200006545553000522109000004242000654555  
603200300002463923000522090000005571000246392  
603200400002635413000521196000248920000263541  
60320050000175960300052196600000101700017596  
603200600001054853004867190000001003000105485  
603200700001451223000522095000003981000145122  
75030000674703180216000700017222840007  
89999990674703180216000070001722284  
9000013  

Substitution Table

1 -> A  
2 -> B  
3 -> C  
4 -> D  
5 -> E  
6 -> F  
7 -> G  
8 -> H  
9 -> I  
0 -> {  

Substitution should occur under the following conditions:

Line starts with a "6", for the numbers located in column 17 and column 45.  
Line starts with a "7", for the number located in column 34.  
Line starts with a "8", for the number located in column 35.  

With the above substitution rules, the resulting file (using the current file as an example), would result in:

output.txt

10004226549870  
2042265498700620  
5032000067470318021604226549870062001186  
6032001000013122I3000522105000000456000131289  
6032002000065455E3000522109000004242000654555  
6032003000024639B3000522090000005571000246392  
6032004000026354A3000521196000248920000263541  
6032005000017596{300052196600000101700017596  
6032006000010548E3004867190000001003000105485  
6032007000014512B3000522095000003981000145122  
750300006747031802160007000172228D0007  
8999999067470318021600007000172228D  
9000013  

**MyCode

my $fn = 'input.txt';
my $wr = 'output.txt';

my %repl = (
    1 => "A"
    2 => "B"
    3 => "C"
    4 => "D"
    5 => "E"
    6 => "F"
    7 => "G"
    8 => "H"
    9 => "I"
    0 => "{"
);

open ( my $fh, '<', $fn ) or die "Could not open file '$fn': $!";

open ( my $ww, '>', $wr ) or die "Could not open file '$fn': $!";


my @lines;
while (my $line = <$fh>) {
    chomp $line;
    push @lines, $line;
    if ( $line =~ /^6/) {
       foreach my $key (sort keys %repl){
            substr($line, 17, 1) =~ s/\b$key\b/$repl{$key}/g
            substr($line, 45, 1) =~ s/\b$key\b/$repl{$key}/g

       }
    }

    elsif ($line =~ /^7/) {
       foreach my $key (sort keys %repl){
            substr($line, 34, 1) =~ s/\b$key\b/$repl{$key}/g

       }
    }
    elsif ($line =~ /^8/) {
       foreach my $key (sort keys %repl){
            substr($line, 35, 1) =~ s/\b$key\b/$repl{$key}/g

       }

    }
    else {
         print $ww $_;
    }
}
close $fh;

close $ww;

Upvotes: 3

Views: 216

Answers (3)

elcaro
elcaro

Reputation: 2317

I generally like to avoid using the regex engine unless I'm matching a pattern. Here's how I might do it.

use strict;
use warnings;

my %table = (
    1 => 'A', 2 => 'B', 3 => 'C', 4 => 'D', 5 => 'E',
    6 => 'F', 7 => 'G', 8 => 'H', 9 => 'I', 0 => '{',
);

my %subs = (
    6 => [17, 45],
    7 => [34],
    8 => [35],
);

while (my $line = <DATA>) {
    chomp $line;
    my $cols = $subs{ substr($line, 0, 1) } // [];
    for my $col (@$cols) {
        next if length($line) < $col;
        my $char = substr($line, $col - 1, 1);
        substr($line, $col - 1, 1) = $table{$char};
    }
    print "$line\n";
}

__DATA__
10004226549870
204226549870062001186000000000040008060802
5032000067470318021604226549870062001186
603200100001312293000522105000000456000131289
603200200006545553000522109000004242000654555
603200300002463923000522090000005571000246392
603200400002635413000521196000248920000263541
60320050000175960300052196600000101700017596
603200600001054853004867190000001003000105485
603200700001451223000522095000003981000145122
75030000674703180216000700017222840007
89999990674703180216000070001722284
9000013

Upvotes: 2

zdim
zdim

Reputation: 66899

This is good and the only problem I can see is that you may have miscounted, as substr counts start with 0. The other thing is that every line needs to be written to the output file, so lose that final else and just write out the line, changed or not.

However, I'd offer an improvement. Probing for a character by running a regex for each key is not needed, and is wasteful and expensive. There are other ways, which are simpler and efficient.

Using substr we need two calls when its replacement is decided dynamically

substr $line, 16, 1, $repl{ substr $line, 16, 1 };

This replaces a substring of length 1 at position 17 (counting starts with 0) with a value in %repl for a key which is the substring of length 1 at position 17.

But this line on its own can have a problem if the string was shorter than the required position. Ths is in fact the case with the line that gets { as replacement, which has only 44 characters. Given rather "specific" behavior of substr with edge cases it's better to not reach for "fixes" but check

# for $col == 45 (etc)
if (length $line >= $col) { 
    substr $line, $col-1, 1, $repl{ substr $line, $col-1, 1 };
}

Those positions can be prepared in an array to avoid scattered magic numbers or scalars.

Another way is with a regex

$line =~ s/.{16}\K(.)/$repl{$1}/;

This matches 16 characters, and \K drops those matches so they don't need to be put back, and the next character is captured and then replaced by its value in the hash. Now a possibly non-existing position (past end of string) doesn't require special handling since the match simply fails.

Either of these two above can be used instead of the foreach loops over the %repl keys and your code should work as it stands, except for the else branch which need be removed and $line just printed. I also hope that you do have

use warnings;
use strict;

somewhere at the beginning, as these are strongly recommended in every program.

Matching a line on either 6/7/8 is OK but you can instead build a hash with those numbers as keys, and their values be code references performing the needed replacements. If these numbers change often (or if there can be many more) then consider doing something like that.


Finally, the print $ww $_; in the else branch would draw warnings since once a topicalizer is used in the while condition (the $line) then $_ isn't provided anymore (it's undefined).

However, that's probably a matter of posting since it uses $ww while the defined handle is $wr, and there are a few other typos.

Upvotes: 3

Polar Bear
Polar Bear

Reputation: 6798

Define an array @repl with characters, create hash %match, use keys of %match hash as input regex filter.

Read data and if line matches to regex filter pass it for further processing. Regex captures matching digit which we use to extract positions from %match hash later.

Split input line into an array of digits and replace digit in position of interest with substitute from @repl array.

Assemble the line back with join and print result.

use strict;
use warnings;
use feature 'say';

my @repl   = split '', '{ABCDEFGHI';
my %match  = ( 6 => [17,45], 7 => [34], 8 => [35] );
my $filter = join('|',keys %match);
my $re = qr/^($filter)/;

while(my $line = <DATA>) {
    chomp $line;
    if( $line =~ /$re/ ) {
        my @nums = split '', $line;
        for my $pos ( @{$match{$1}} ) {
            my $i = $pos-1;
            $nums[$i] = $repl[$nums[$i]] if defined $nums[$i];
        }
        $line = join '', @nums;
    }
    say $line;
}

__DATA__
10004226549870  
204226549870062001186000000000040008060802
5032000067470318021604226549870062001186
603200100001312293000522105000000456000131289
603200200006545553000522109000004242000654555
603200300002463923000522090000005571000246392
603200400002635413000521196000248920000263541
60320050000175960300052196600000101700017596
603200600001054853004867190000001003000105485
603200700001451223000522095000003981000145122
75030000674703180216000700017222840007
89999990674703180216000070001722284
9000013

Output

10004226549870
204226549870062001186000000000040008060802
5032000067470318021604226549870062001186
6032001000013122I300052210500000045600013128I
6032002000065455E300052210900000424200065455E
6032003000024639B300052209000000557100024639B
6032004000026354A300052119600024892000026354A
6032005000017596{300052196600000101700017596
6032006000010548E300486719000000100300010548E
6032007000014512B300052209500000398100014512B
750300006747031802160007000172228D0007
8999999067470318021600007000172228D
9000013

Upvotes: 1

Related Questions