Leon
Leon

Reputation: 468

Why the second regex "grep(/keyword/,@array" does not work using in Perl?

I wrote a Perl program to analyze my research data.
One function of my Perl scripts is used to count atom number in different groups (I used two arrays @former_lists and @modifier_lists to identify two groups).

If atom name is in group1 (@former_lists), then the variable $cnt_former_intf++;
if it is in group2 (@modifier_lists), then the variable $cnt_modf_intf++;
if it is oxygen atom, then $cnt_oxyg_intf++;
else{$cnt_other_intf++}.

Below is the part of my codes.

......
my $flg_interface;
my @former_lists;
my $cnt_former_intf=0;
my $cnt_former_exbox=0;
my $cnt_modf_intf=0;
my $cnt_modf_exbox=0;
my $cnt_oxyg_intf=0;
my $cnt_oxyg_exbox=0;
my $cnt_other_intf=0;
my $cnt_other_exbox=0;
$former_lists[0]='SI';$former_lists[1]='AL';
my @modifier_lists;
$modifier_lists[0]='CA';$modifier_lists[1]='NA';
my $hash_key;
my %hash_type_spc;
$hash_type_spc{1}='SI';
$hash_type_spc{2}='AL';
$hash_type_spc{3}='CA';
$hash_type_spc{4}='O';
$hash_type_spc{5}='H';
$hash_type_spc{6}='NA';
my @atom_type;
$atom_type[1]=1;
$atom_type[2]=2;
$atom_type[3]=3;
$atom_type[4]=4;
$atom_type[5]=5;
$atom_type[6]=6;
my $atom_id;

for($atom_id=1;$atom_id<=17587;$atom_id++)
{  $hash_key=$atom_type[$atom_id];
  $_=uc($hash_type_spc{$hash_key});chomp($_);
  if ($flg_interface ==1)   #atom is in interface box
  {
    if($_ eq 'O'){$cnt_oxyg_intf++;}
    elsif($_ eq 'H'){$cnt_hydg_intf++;}
    elsif(grep(/$_/,@former_lists)  eq 1){$cnt_former_intf++;}
    #elsif(grep(/$_/,@modifier_lists) == 1){$cnt_modf_intf++;}
    elsif(grep(/$_/,@modifier_lists) eq 1){$cnt_modf_intf++;}
    else{$cnt_other_intf++;}
  }
  else                      #atom is in extended box
  {
    if($_ eq "O"){$cnt_oxyg_exbox++;}
    elsif($_ eq "H"){$cnt_hydg_exbox++;}
    elsif(grep(/$_/,@former_lists) eq 1){$cnt_former_exbox++;}
    elsif(grep(/$_/,@modifier_lists) eq 1){$cnt_modf_exbox++;}
    else{$cnt_other_exbox++;}
  }
}#end for
print "1021 $_$atom_id \t\$flg_interface=$flg_interface \t\$cnt_former_intf=$cnt_former_intf \t\$cnt_modf_intf=$cnt_modf_intf \t\$cnt_modf_intf=$cnt_modf_intf\t\$cnt_former_exbox=$cnt_former_exbox\t\$cnt_modf_exbox=$cnt_modf_exbox\n";
$tmp=<STDIN>; 

....

The result shows below.

1021 SI6090 $flg_interface=0    $cnt_former_intf=0  $cnt_modf_intf=0    $cnt_former_exbox=0 $cnt_modf_exbox=1
1021 AL7235 $flg_interface=0    $cnt_former_intf=0  $cnt_modf_intf=0    $cnt_former_exbox=0 $cnt_modf_exbox=2
1021 CA8029 $flg_interface=0    $cnt_former_intf=0  $cnt_modf_intf=0    $cnt_former_exbox=0 $cnt_modf_exbox=3

where, 1021 is a label. Here,

the 1st output SI6090 should have $cnt_former_exbox=1 instead of 0;
the 2nd output AL7235 should have $cnt_former_exbox=2 instead of 0;
the 3rd output CA8029 should have $cnt_modf_exbox=1 instead of 3.

Any suggestion and help would be highly appreciated.
I highly appreciate it if you can share a more efficient way.

NOTE: My data is heavy data. I have to consider running efficiency.

Upvotes: 1

Views: 77

Answers (1)

Leon
Leon

Reputation: 468

Thanks for everybody's time and help.

After searching online, I found the regex expression is not correct. If I want to search whether the array has the element, I need to use

            $tmp=uc($hash_type_spc{$hash_key});chomp($tmp);
            if($tmp eq 'O'){$cnt_oxyg_intf++;}
            elsif($tmp eq 'H'){$cnt_hydg_intf++;}
            #elsif($tmp eq "CA"){$cnt_modf_intf++;}
            elsif(grep { $tmp eq $_ } @former_lists){$cnt_former_intf++;}
            elsif(grep { $tmp eq $_ } @modifer_lists){$cnt_modf_intf++;}
            else{$cnt_other_intf++;}

here, I modified $_=uc($hash_type_spc{$hash_key});chomp($_) into $tmp=...

If the regex expression is changed into above, I can get the correct result. However, I do not fully understand this regex expression. Any explanation would be highly appreciated.

Upvotes: 1

Related Questions