Jaeyoung Park
Jaeyoung Park

Reputation: 339

Perl - Matching with dont care conditions and reading csv files

I posted a question regarding don't care symbol (X) in Perl before. I now have a working code, but this does not work where reading files.

Let's say I have a 50-bit binary input and database. If the input is matched with data in database, I would return a pre-defined value.

Let's say the data in the database is 11001100100010110111110110101001000010110101111101 .

If the input is 11XX11001000101101111101101010010000101101011111X1, I would like to say it is a matched case because X can be 1 or 0. I know a way to split 50 bits in 50 1-bit and make an exception, but I would prefer to handle 50-bit together.

In my code (dontcare.pl), the first part is working with internally defined input and database. However, I would like to read an input file (input_text.txt) and an database file (database.txt) that includes other information and do the same thing.

dontcare.pl:

#!/usr/bin/perl 

####### 1st part, Internal string input and database
my $input = '11XX11001000101101111101101010010000101101011111X1';
( my $mask = $input ) =~ tr/X01/\x00\xFF\xFF/;
( my $targ = $input ) =~ tr/X/\x00/;

for my $num_bin (qw(
   11001100100010110111110110101001000010110101111101
   10101100100010110111110110101001000010110101111101
)) {
   if (($num_bin & $mask) eq $targ) {
      print "$num_bin matches\n";
   } else {
      print "$num_bin doesn't match\n";      
   }
}


####### 2nd part, Reading input and database files 
        print " Reading files\n";      
##### Read input
my @Dinput=do{
    open my $Dfh,"<","input_test.txt" or die("Cannot open an input file $!");
    <$Dfh>;
};

##### Read database
open(CSV,'database.txt')||die("Cannot open db file $!");
my @Ddb;

while(<CSV>){
    my @row=split(/\t/,$_);
    push(@Ddb,\@row);
}
close CSV || die $!;


for (my $n=0; $n < (scalar @Dinput); $n +=1) {

for (my $i=0; $i < (scalar @Ddb); $i +=2) {
    (my $Dmask = $Dinput[$n]) =~ tr/X01/\x00\xFF\xFF/;
    (my $Dtarg = $Dinput[$n]) =~ tr/X/\x00/;

    if (( $Ddb[$i][1] & $Dmask) eq $Dtarg) {
        print "$Ddb[$i][1] matched\n";
    } else {
        print "$Ddb[$i][1] didn't match\n";      
    }
}

}

input_test.txt : (an input file containing two inputs)

11XX11001000101101111101101010010000101101011111X1
1000011000111101001011110111001100100101111000010X

database.txt : (a database file. It has 50-bit binary in the second column.Other information are also in the file)

0.1 11001100100010110111110110101001000010110101111101  rml_irf_old_e_cwp_e[1]  rml_irf_new_e_cwp_e[1]  rml_irf_swap_even_e rml_irf_old_e_cwp_e[0]  rml_irf_new_e_cwp_e[0]  rml_irf_swap_odd_e
0.1 11101100110010011011001101100111001001100000010011  3.923510310023e-06  3.19470818154393e-08    7.05437377900141e-10    7.05437377900141e-10    4.89200539851702e-17    5.01433479478681e-19
0.1 10000110001111010010111101110011001001011110000100  rml_irf_new_e_cwp_e[1]  rml_irf_new_e_cwp_e[0]
0.1 01110111010010000000101001000001100011011100011111  0.052908822741908   2.7185508579738e-05

I guess it is a type casting problem. The first part has an string input and string database, so it works. However, the second part automatically reads input and data from files as integers. I searched the type casting and realized there is no casting function in Perl (Or I am wrong). Please let me know any idea and/or recommendation to resolve this issue.

In short, I wanted to make matching with dont care condition works with input and database files. Please let me know if you have other ways to work this. (I used a temporary value change in the input file)

Upvotes: 0

Views: 142

Answers (2)

Sobrique
Sobrique

Reputation: 53478

Well, type casting - that doesn't exist in the way you think, because perl doesn't really care whether something is a string or a number - it does the right thing depending on context.

However, there are things like pack and unpack which convert raw binary data to a more usable representation. E.g. from (raw) binary to hex, and back again. These don't seem to apply, because your input isn't binary - it's just text.

But I have to say - I think you're tackling this a harder way than you need to (unless I'm misunderstanding your problem) and you don't actually need to do binary transforming at all:

#!/usr/bin/perl

use warnings;
use strict;

#or read this from a file
my @input = qw ( 11XX11001000101101111101101010010000101101011111X1
                 1000011000111101001011110111001100100101111000010X );
#replace 'X' with '.' which is the regex "don't care" character.                 
s/X/./g for @input;
#compile a regex made of these two patterns. 
my $search = join ( "|", @input );
   $search = qr/$search/; 

print "Compiled input patterns into a regex of: \n";
print $search,"\n";

#iterate database (pasted in 'data' block for illustrative purposes)
while ( <DATA> ) {
    my ( $id, $target, @rest ) = split; #split on whitespace. 
              # you are using tab sep, so you might prefer split /\t/;
    #field 1 = ID
    #field 2 = $target
    #everything else = @rest
    #compare $target with the regex we compiled above, and print the 
    #current line if it matches. 
    print if $target =~ /$search/;
}


__DATA__
0.1 11001100100010110111110110101001000010110101111101  rml_irf_old_e_cwp_e[1]  rml_irf_new_e_cwp_e[1]  rml_irf_swap_even_e rml_irf_old_e_cwp_e[0]  rml_irf_new_e_cwp_e[0]  rml_irf_swap_odd_e
0.1 11101100110010011011001101100111001001100000010011  3.923510310023e-06  3.19470818154393e-08    7.05437377900141e-10    7.05437377900141e-10    4.89200539851702e-17    5.01433479478681e-19
0.1 10000110001111010010111101110011001001011110000100  rml_irf_new_e_cwp_e[1]  rml_irf_new_e_cwp_e[0]
0.1 01110111010010000000101001000001100011011100011111  0.052908822741908   2.7185508579738e-05

This then, for your database, prints:

0.1 11001100100010110111110110101001000010110101111101  rml_irf_old_e_cwp_e[1]  rml_irf_new_e_cwp_e[1]  rml_irf_swap_even_e rml_irf_old_e_cwp_e[0]  rml_irf_new_e_cwp_e[0]  rml_irf_swap_odd_e
0.1 10000110001111010010111101110011001001011110000100  rml_irf_new_e_cwp_e[1]  rml_irf_new_e_cwp_e[0]

In terms of reading patterns from a particular file - the most likely reason that would break is if you forget to chomp the patterns as you read them.

So you'd load them like this (tested with the above data):

#!/usr/bin/perl

use warnings;
use strict;

#Read patterns from file
open ( my $input_fh, '<', 'patterns.txt' ) or die $!; 
chomp ( my @input = <$input_fh> );
close ( $input_fh );
#replace 'X' with '.' which is the regex "don't care" character.                 
s/X/./g for @input;
#compile a regex made of these two patterns. 
my $search = join ( "|", @input );
   $search = qr/$search/; 

#iterate database (pasted in 'data' block for illustrative purposes)
open ( my $data, '<', 'database.txt' ) or die $!;
while ( <$data> ) {
    my ( $id, $target, @rest ) = split;
    #print if the target line matches
    print if $target =~ /$search/;
}

Specifically with your code (and that of your answer):

  • Turn on use strict; use warnings; - it's important for troubleshooting.
  • You don't need to double-loop, because turning your input patterns into an alternation regex does that for you (more efficiently).
  • Always use 3 arg open with lexical file handles. open ( my $input_fh, '<', 'patterns.txt' ) or die $! because a file handle of CSV is a global (and doesn't auto close like a lexical does when it goes out of scope).
  • $i < (scalar @Ddb) is redundant. < makes it a scalar context, so you can can just $i < @db and get the same result.
  • perltidy is a good thing for code formatting. perltidy -pbp will format based on "perl best practices".

Upvotes: 3

Jaeyoung Park
Jaeyoung Park

Reputation: 339

Thank you for the help - @Sobrique

My original code made my code more complicated. What I wanted to do is actually ".", which is a dont care symbol and the way to handle this symbol. Also, reading csv files as an input and database is needed. @sobrique helped me a lot to resolved all the issues and the following is my final code.

my code:

#!/usr/bin/perl 

##### Read input

open my $input_fh, '<', 'input_test.txt' or die $! ; chomp ( my @input = <$input_fh> );

#replace 'X' with '.' which is the regex "don't care" character.                 
s/X/./g for @input;
#compile a regex made of these two patterns. 
#my $search = join ( "|", @input ); 
#   $search = qr/$search/;      
my $search = join ( "|", $input[0] ); 
   $search = qr/$search/;   

##### Read database
open(CSV,'database.txt')||die("Cannot open db file $!");
my @Ddb;
while(<CSV>){
    my @row=split(/\t/,$_);
    push(@Ddb,\@row);
}
close CSV || die $!;


#iterate database (pasted in 'data' block for illustrative purposes)
for (my $n=0; $n < (scalar @input); $n +=2) {

for (my $i=0; $i < (scalar @Ddb); $i +=2) {
    if ($Ddb[$i][1] =~ /$search/) {
        print "$Ddb[$i][1] matched\n";
        print "$Ddb[$i][2] \n";
    } 
#else {
#       print "$Ddb[$i][1] didn't match\n";      
#       }
}

}

input_test.txt :

10001000110010001001110111000011001010110010000011
10111101010011000101001011110000001110101110010011

Upvotes: 0

Related Questions