Reputation: 3222
I have an array (@array
) which has list of elements. I need to check whether these each of the elements are exists in master file or not. If the element exists in master file then in the same line of master file the string YES
(in 5th position) should also exists. And the element should be stored in different array.
Actually my script uses two grep
shell command to achieve this. How can I write same thing in Perl do grep.
...
use Data::Dumper;
my @new_array;
my @array = ('RT0AC1', 'WG3RA3');
print Dumper(\@array);
foreach ( @array ){
my $line = `grep $_ "master_file.csv" | grep -i yes`;
next unless($line);
push( @new_array, $_ );
}
print Dumper(@new_array);
...
where master_file.csv looks like this:
101,RT0AC1,CONNECTED,FAULTY,NO
102,RT0AC1,CONNECTED,WORKING,YES
103,RT0AC1,NOT CONNECTED,WORKING,NO
104,WG3RA3,NOT CONNECTED,DISABLED,NO
105,WG3RA3,CONNECTED,WORKING,NO
So Here I am getting $line
value as 102,RT0AC1,CONNECTED,WORKING,YES
and element RT0AC1
is getting stored in @new_array
.
How can I avoid using backtick(`) and two greps to achieve this. I am trying to do this using pure Perl. Also the master_file.csv
contains millions of records.
Upvotes: 1
Views: 1172
Reputation: 6798
Form regex to match records of interest, split line into fields and compare field #5 to YES. If there is a match increase a count for field #2 in %match
hash.
Once the file processed %match
hash will have matched records field #2 as a key and value will reflect how many times this field was matched with YES in the file.
use strict;
use warnings;
use feature 'say';
use Data::Dumper;
my %match;
my @look_for = qw(RT0AC1 WG3RA3);
my $re_filter = join('|',@look_for);
while(<DATA>) {
chomp;
next unless /$re_filter/;
my @data = split(',',$_);
$match{$data[1]}++ if $data[4] eq 'YES';
}
say Dumper(\%match);
__DATA__
101,RT0AC1,CONNECTED,FAULTY,NO
102,RT0AC1,CONNECTED,WORKING,YES
103,RT0AC1,NOT CONNECTED,WORKING,NO
104,WG3RA3,NOT CONNECTED,DISABLED,NO
105,WG3RA3,CONNECTED,WORKING,NO
Output
$VAR1 = {
'RT0AC1' => 1
};
Remove DATA
to get final code and give filename on command line to process file with data of interest
use strict;
use warnings;
use feature 'say';
use Data::Dumper;
my %match;
my @look_for = qw(RT0AC1 WG3RA3);
my $re_filter = join('|',@look_for);
while(<>) {
chomp;
next unless /$re_filter/;
my @data = split(',',$_);
$match{$data[1]}++ if $data[4] eq 'YES';
}
say Dumper(\%match);
An alternative version based on regular expression without using split
use strict;
use warnings;
use feature 'say';
use Data::Dumper;
my %match;
my @look_for = qw(RT0AC1 WG3RA3);
my $re_filter = join('|',@look_for);
my $regex = qr/^\d+,($re_filter),[^,]+,[^,]+,YES$/;
/$regex/ && $match{$1}++ for <DATA>;
say Dumper(\%match);
__DATA__
101,RT0AC1,CONNECTED,FAULTY,NO
102,RT0AC1,CONNECTED,WORKING,YES
103,RT0AC1,NOT CONNECTED,WORKING,NO
104,WG3RA3,NOT CONNECTED,DISABLED,NO
105,WG3RA3,CONNECTED,WORKING,NO
Upvotes: 0
Reputation: 52344
Since all the words you're looking for are in the same location, it's easy to just split up the current line on commas and see if the second column exists in a hash table, and if the fifth column is equal to "YES"
:
#!/usr/bin/env perl
use strict;
use warnings;
use autodie;
use Data::Dumper;
my $filename = shift // "master_file.csv"; # Default filename if not given on command line
my @array = qw/RT0AC1 WG3RA3/; # Words you're looking for
my %words = map { $_ => 1 } @array; # Store them in a hash for fast lookup
my @new_array;
# Use Text::CSV_XS for non-trivial CSV files
open my $csv, "<", $filename;
while (<$csv>) {
chomp;
my @F = split /,/;
push @new_array, $F[1] if exists $words{$F[1]} && $F[4] eq "YES";
}
print Dumper(\@new_array);
Upvotes: 2