Reputation: 339
I posted a question regarding don't care symbol (X) in Perl before. I now have a working code, but this does not work where reading files.
Let's say I have a 50-bit binary input and database. If the input is matched with data in database, I would return a pre-defined value.
Let's say the data in the database is 11001100100010110111110110101001000010110101111101 .
If the input is 11XX11001000101101111101101010010000101101011111X1, I would like to say it is a matched case because X can be 1 or 0. I know a way to split 50 bits in 50 1-bit and make an exception, but I would prefer to handle 50-bit together.
In my code (dontcare.pl), the first part is working with internally defined input and database. However, I would like to read an input file (input_text.txt) and an database file (database.txt) that includes other information and do the same thing.
dontcare.pl:
#!/usr/bin/perl
####### 1st part, Internal string input and database
my $input = '11XX11001000101101111101101010010000101101011111X1';
( my $mask = $input ) =~ tr/X01/\x00\xFF\xFF/;
( my $targ = $input ) =~ tr/X/\x00/;
for my $num_bin (qw(
11001100100010110111110110101001000010110101111101
10101100100010110111110110101001000010110101111101
)) {
if (($num_bin & $mask) eq $targ) {
print "$num_bin matches\n";
} else {
print "$num_bin doesn't match\n";
}
}
####### 2nd part, Reading input and database files
print " Reading files\n";
##### Read input
my @Dinput=do{
open my $Dfh,"<","input_test.txt" or die("Cannot open an input file $!");
<$Dfh>;
};
##### Read database
open(CSV,'database.txt')||die("Cannot open db file $!");
my @Ddb;
while(<CSV>){
my @row=split(/\t/,$_);
push(@Ddb,\@row);
}
close CSV || die $!;
for (my $n=0; $n < (scalar @Dinput); $n +=1) {
for (my $i=0; $i < (scalar @Ddb); $i +=2) {
(my $Dmask = $Dinput[$n]) =~ tr/X01/\x00\xFF\xFF/;
(my $Dtarg = $Dinput[$n]) =~ tr/X/\x00/;
if (( $Ddb[$i][1] & $Dmask) eq $Dtarg) {
print "$Ddb[$i][1] matched\n";
} else {
print "$Ddb[$i][1] didn't match\n";
}
}
}
input_test.txt : (an input file containing two inputs)
11XX11001000101101111101101010010000101101011111X1
1000011000111101001011110111001100100101111000010X
database.txt : (a database file. It has 50-bit binary in the second column.Other information are also in the file)
0.1 11001100100010110111110110101001000010110101111101 rml_irf_old_e_cwp_e[1] rml_irf_new_e_cwp_e[1] rml_irf_swap_even_e rml_irf_old_e_cwp_e[0] rml_irf_new_e_cwp_e[0] rml_irf_swap_odd_e
0.1 11101100110010011011001101100111001001100000010011 3.923510310023e-06 3.19470818154393e-08 7.05437377900141e-10 7.05437377900141e-10 4.89200539851702e-17 5.01433479478681e-19
0.1 10000110001111010010111101110011001001011110000100 rml_irf_new_e_cwp_e[1] rml_irf_new_e_cwp_e[0]
0.1 01110111010010000000101001000001100011011100011111 0.052908822741908 2.7185508579738e-05
I guess it is a type casting problem. The first part has an string input and string database, so it works. However, the second part automatically reads input and data from files as integers. I searched the type casting and realized there is no casting function in Perl (Or I am wrong). Please let me know any idea and/or recommendation to resolve this issue.
In short, I wanted to make matching with dont care condition works with input and database files. Please let me know if you have other ways to work this. (I used a temporary value change in the input file)
Upvotes: 0
Views: 142
Reputation: 53478
Well, type casting - that doesn't exist in the way you think, because perl doesn't really care whether something is a string or a number - it does the right thing depending on context.
However, there are things like pack
and unpack
which convert raw binary data to a more usable representation. E.g. from (raw) binary to hex, and back again. These don't seem to apply, because your input isn't binary - it's just text.
But I have to say - I think you're tackling this a harder way than you need to (unless I'm misunderstanding your problem) and you don't actually need to do binary transforming at all:
#!/usr/bin/perl
use warnings;
use strict;
#or read this from a file
my @input = qw ( 11XX11001000101101111101101010010000101101011111X1
1000011000111101001011110111001100100101111000010X );
#replace 'X' with '.' which is the regex "don't care" character.
s/X/./g for @input;
#compile a regex made of these two patterns.
my $search = join ( "|", @input );
$search = qr/$search/;
print "Compiled input patterns into a regex of: \n";
print $search,"\n";
#iterate database (pasted in 'data' block for illustrative purposes)
while ( <DATA> ) {
my ( $id, $target, @rest ) = split; #split on whitespace.
# you are using tab sep, so you might prefer split /\t/;
#field 1 = ID
#field 2 = $target
#everything else = @rest
#compare $target with the regex we compiled above, and print the
#current line if it matches.
print if $target =~ /$search/;
}
__DATA__
0.1 11001100100010110111110110101001000010110101111101 rml_irf_old_e_cwp_e[1] rml_irf_new_e_cwp_e[1] rml_irf_swap_even_e rml_irf_old_e_cwp_e[0] rml_irf_new_e_cwp_e[0] rml_irf_swap_odd_e
0.1 11101100110010011011001101100111001001100000010011 3.923510310023e-06 3.19470818154393e-08 7.05437377900141e-10 7.05437377900141e-10 4.89200539851702e-17 5.01433479478681e-19
0.1 10000110001111010010111101110011001001011110000100 rml_irf_new_e_cwp_e[1] rml_irf_new_e_cwp_e[0]
0.1 01110111010010000000101001000001100011011100011111 0.052908822741908 2.7185508579738e-05
This then, for your database, prints:
0.1 11001100100010110111110110101001000010110101111101 rml_irf_old_e_cwp_e[1] rml_irf_new_e_cwp_e[1] rml_irf_swap_even_e rml_irf_old_e_cwp_e[0] rml_irf_new_e_cwp_e[0] rml_irf_swap_odd_e
0.1 10000110001111010010111101110011001001011110000100 rml_irf_new_e_cwp_e[1] rml_irf_new_e_cwp_e[0]
In terms of reading patterns from a particular file - the most likely reason that would break is if you forget to chomp
the patterns as you read them.
So you'd load them like this (tested with the above data):
#!/usr/bin/perl
use warnings;
use strict;
#Read patterns from file
open ( my $input_fh, '<', 'patterns.txt' ) or die $!;
chomp ( my @input = <$input_fh> );
close ( $input_fh );
#replace 'X' with '.' which is the regex "don't care" character.
s/X/./g for @input;
#compile a regex made of these two patterns.
my $search = join ( "|", @input );
$search = qr/$search/;
#iterate database (pasted in 'data' block for illustrative purposes)
open ( my $data, '<', 'database.txt' ) or die $!;
while ( <$data> ) {
my ( $id, $target, @rest ) = split;
#print if the target line matches
print if $target =~ /$search/;
}
Specifically with your code (and that of your answer):
use strict; use warnings;
- it's important for troubleshooting.open ( my $input_fh, '<', 'patterns.txt' ) or die $!
because a file handle of CSV
is a global (and doesn't auto close like a lexical does when it goes out of scope). $i < (scalar @Ddb)
is redundant. <
makes it a scalar context, so you can can just $i < @db
and get the same result. perltidy
is a good thing for code formatting. perltidy -pbp
will format based on "perl best practices". Upvotes: 3
Reputation: 339
Thank you for the help - @Sobrique
My original code made my code more complicated. What I wanted to do is actually ".", which is a dont care symbol and the way to handle this symbol. Also, reading csv files as an input and database is needed. @sobrique helped me a lot to resolved all the issues and the following is my final code.
my code:
#!/usr/bin/perl
##### Read input
open my $input_fh, '<', 'input_test.txt' or die $! ; chomp ( my @input = <$input_fh> );
#replace 'X' with '.' which is the regex "don't care" character.
s/X/./g for @input;
#compile a regex made of these two patterns.
#my $search = join ( "|", @input );
# $search = qr/$search/;
my $search = join ( "|", $input[0] );
$search = qr/$search/;
##### Read database
open(CSV,'database.txt')||die("Cannot open db file $!");
my @Ddb;
while(<CSV>){
my @row=split(/\t/,$_);
push(@Ddb,\@row);
}
close CSV || die $!;
#iterate database (pasted in 'data' block for illustrative purposes)
for (my $n=0; $n < (scalar @input); $n +=2) {
for (my $i=0; $i < (scalar @Ddb); $i +=2) {
if ($Ddb[$i][1] =~ /$search/) {
print "$Ddb[$i][1] matched\n";
print "$Ddb[$i][2] \n";
}
#else {
# print "$Ddb[$i][1] didn't match\n";
# }
}
}
input_test.txt :
10001000110010001001110111000011001010110010000011
10111101010011000101001011110000001110101110010011
Upvotes: 0