Reputation: 23
I have an array (@myarray) with strings as such:
rs30000489
rs903484
rs24567;rs324987;rs234985
rs5905002
rs32456;rs2349085
When I match another similar (@otherarray) array without strings with semicolons and multiple rsIDs, with the following code:
for $1(0 .. $#otherarray) {
for $m(1 .. $#myarray) {
if ($myarray[$m] =~ /$otherarray[$i]/i) {
$IDmatch = 1;
}
}
}
The script does not match any of the IDs within strings with semicolons. I tried splitting the semicolon strings like such:
foreach $string (@myarray) {
if ($string =~ m/;/) {
push (@newarray, $string);
}
}
Which returns the array @new:
rs24567;rs324987;rs234985
rs32456;rs2349085
Which I then try to split it by a common character as such:
foreach $line (@new) {
$line =~ tr/;//d;
$line =~ s/rs/ rs/g;
$line = split (/ /);
}
But when I print the @new array it just returns zeros. I know this must have something to do with my loop because I have trouble working with loops in perl. Please let me know if you have any ideas! Thanks!
Upvotes: 1
Views: 82
Reputation: 3601
If you're looking for unique items, the first thing you should think of is hashes. Try:
#!/usr/bin/env perl
use strict;
use warnings;
# --------------------------------------
use charnames qw( :full :short );
use English qw( -no_match_vars ); # Avoids regex performance penalty
use Data::Dumper;
# Make Data::Dumper pretty
$Data::Dumper::Sortkeys = 1;
$Data::Dumper::Indent = 1;
# Set maximum depth for Data::Dumper, zero means unlimited
local $Data::Dumper::Maxdepth = 0;
# conditional compile DEBUGging statements
# See http://lookatperl.blogspot.ca/2013/07/a-look-at-conditional-compiling-of.html
use constant DEBUG => $ENV{DEBUG};
# --------------------------------------
# Name: unique
# Usage: %hash = unique( @array );
# Purpose: Create a hash of unique keys from array items.
# Parameters: @array -- May have multiple entries separated by a semi-colon
# Returns: %hash -- Unique keys of array items
#
sub unique {
my @array = @_;
my %hash = ();
for my $item ( @array ){
my @items = split m{ \; }msx, $item;
$hash{$_} ++ for @items;
}
return %hash;
}
# --------------------------------------
my @myarray = qw(
rs30000489
rs903484
rs24567;rs324987;rs234985
rs5905002
rs32456;rs2349085
);
my @otherarray = qw(
rs3249487
rs30000489
rs325987
rs324987
rs234967
rs32456
rs234567
);
my %my_hash = unique( @myarray );
print Dumper \%my_hash if DEBUG;
my %other_hash = unique( @otherarray );
print Dumper \%other_hash if DEBUG;
my %intersection = ();
for my $item ( keys %my_hash ){
if( exists $other_hash{$item} ){
$intersection{$item} ++;
}
}
print Dumper \%intersection if DEBUG;
Upvotes: 0
Reputation: 126722
You don't say what you want to do with these two arrays, but if I understand your question peoperly it sounds like you probably want to find all those rsIDs that appear in both lists.
This program works by converting the first array (please use better names than myarray
and otherarray
) into a hash that has all the IDs as keys. Then it uses grep
to find all those in the second array that appear in the hash, pushing them to array @dups
.
use strict;
use warnings;
my @myarray = qw(
rs30000489
rs903484
rs24567;rs324987;rs234985
rs5905002
rs32456;rs2349085
);
my @otherarray = qw(
rs3249487
rs30000489
rs325987
rs324987
rs234967
rs32456
rs234567
);
my %rsids = map { $_ => 1 } map { split /;/ } @myarray;
my @dups = grep $rsids{$_}, @otherarray;
print "$_\n" for @dups;
output
rs30000489
rs324987
rs32456
Upvotes: 2
Reputation:
Just to a few things about the loops in Perl. You wrote the for
loop in 2 different ways.
for $1(0 .. $#otherarray) { ... }
and
foreach $line (@new) { ... }
You can write the first loop loop exactly same like the 2nd.
foreach $1 ( 0..$#otherarray) { .. }
or much better
foreach my $other_array_content ( @otherarray) { .. }
The $1
you used is a special char (used in Regular Expressions).
Then you can use the split
inside the foreach
loop as well.
foreach my $data (@data) {
foreach (split /;/,$data) {
}
}
Here is a short solution in a nutshell of your problem:
my @checked = qw(rs30000489
rs9033484
rs2349285
rs5905402
rs32456
);
my $idMatch = 0;
my @data = <DATA>;
foreach my $data ( @data ) {
foreach my $checked ( @checked ) {
if ( $data =~ m/;/ ) {
foreach my $data2 ( split /;/, $data ) {
if ( $checked eq $data2 ) {
$idMatch = 1;
}
}
} else {
if ($data eq $checked) {
$idMatch = 1;
}
}
}
}
print $idMatch;
__DATA__
rs30000489
rs903484
rs24567;rs324987;rs234985
rs5905002
rs32456;rs2349085
Upvotes: 0