user2560686
user2560686

Reputation: 23

Splitting specific strings in an array?

I have an array (@myarray) with strings as such:

rs30000489
rs903484
rs24567;rs324987;rs234985
rs5905002
rs32456;rs2349085

When I match another similar (@otherarray) array without strings with semicolons and multiple rsIDs, with the following code:

for $1(0 .. $#otherarray) {
    for $m(1 .. $#myarray) {
        if ($myarray[$m] =~ /$otherarray[$i]/i) { 
            $IDmatch = 1;
        }
    }
}

The script does not match any of the IDs within strings with semicolons. I tried splitting the semicolon strings like such:

foreach $string (@myarray) {
    if ($string =~ m/;/) {
        push (@newarray, $string);
    }
}

Which returns the array @new:

rs24567;rs324987;rs234985
rs32456;rs2349085

Which I then try to split it by a common character as such:

foreach $line (@new) {
    $line =~ tr/;//d;
    $line =~ s/rs/ rs/g;
    $line = split (/ /);
}

But when I print the @new array it just returns zeros. I know this must have something to do with my loop because I have trouble working with loops in perl. Please let me know if you have any ideas! Thanks!

Upvotes: 1

Views: 82

Answers (3)

shawnhcorey
shawnhcorey

Reputation: 3601

If you're looking for unique items, the first thing you should think of is hashes. Try:

#!/usr/bin/env perl

use strict;
use warnings;

# --------------------------------------

use charnames qw( :full :short   );
use English   qw( -no_match_vars );  # Avoids regex performance penalty

use Data::Dumper;

# Make Data::Dumper pretty
$Data::Dumper::Sortkeys = 1;
$Data::Dumper::Indent   = 1;

# Set maximum depth for Data::Dumper, zero means unlimited
local $Data::Dumper::Maxdepth = 0;

# conditional compile DEBUGging statements
# See http://lookatperl.blogspot.ca/2013/07/a-look-at-conditional-compiling-of.html
use constant DEBUG => $ENV{DEBUG};

# --------------------------------------
#       Name: unique
#      Usage: %hash = unique( @array );
#    Purpose: Create a hash of unique keys from array items.
# Parameters: @array -- May have multiple entries separated by a semi-colon
#    Returns:  %hash -- Unique keys of array items
#
sub unique {
  my @array = @_;
  my %hash  = ();

  for my $item ( @array ){
    my @items = split m{ \; }msx, $item;
    $hash{$_} ++ for @items;
  }

  return %hash;
}

# --------------------------------------

my @myarray = qw(
  rs30000489
  rs903484
  rs24567;rs324987;rs234985
  rs5905002
  rs32456;rs2349085
);

my @otherarray = qw(
  rs3249487
  rs30000489
  rs325987
  rs324987
  rs234967
  rs32456
  rs234567
);

my %my_hash = unique( @myarray );
print Dumper \%my_hash if DEBUG;

my %other_hash = unique( @otherarray );
print Dumper \%other_hash if DEBUG;

my %intersection = ();
for my $item ( keys %my_hash ){
  if( exists $other_hash{$item} ){
    $intersection{$item} ++;
  }
}
print Dumper \%intersection if DEBUG;

Upvotes: 0

Borodin
Borodin

Reputation: 126722

You don't say what you want to do with these two arrays, but if I understand your question peoperly it sounds like you probably want to find all those rsIDs that appear in both lists.

This program works by converting the first array (please use better names than myarray and otherarray) into a hash that has all the IDs as keys. Then it uses grep to find all those in the second array that appear in the hash, pushing them to array @dups.

use strict;
use warnings;

my @myarray = qw(
  rs30000489
  rs903484
  rs24567;rs324987;rs234985
  rs5905002
  rs32456;rs2349085
);

my @otherarray = qw(
  rs3249487
  rs30000489
  rs325987
  rs324987
  rs234967
  rs32456
  rs234567
);

my %rsids = map { $_ => 1 } map { split /;/ } @myarray;

my @dups = grep $rsids{$_}, @otherarray;

print "$_\n" for @dups;

output

rs30000489
rs324987
rs32456

Upvotes: 2

user1558455
user1558455

Reputation:

Just to a few things about the loops in Perl. You wrote the for loop in 2 different ways.

for $1(0 .. $#otherarray) { ... }

and

foreach $line (@new) { ... }

You can write the first loop loop exactly same like the 2nd.

foreach $1 ( 0..$#otherarray) { .. }

or much better

foreach my $other_array_content ( @otherarray) { .. }

The $1 you used is a special char (used in Regular Expressions).

Then you can use the split inside the foreach loop as well.

foreach my $data (@data) {
  foreach (split /;/,$data) {

  }
}

Here is a short solution in a nutshell of your problem:

my @checked = qw(rs30000489
  rs9033484
  rs2349285
  rs5905402
  rs32456
);

my $idMatch = 0;
my @data    = <DATA>;

foreach my $data ( @data ) {
  foreach my $checked ( @checked ) {
    if ( $data =~ m/;/ ) {
      foreach my $data2 ( split /;/, $data ) {
        if ( $checked eq $data2 ) {
          $idMatch = 1;
        }
      }
    } else {
      if ($data eq $checked) {
        $idMatch = 1;
      }
    }
  }
}

print $idMatch;

__DATA__

rs30000489
rs903484
rs24567;rs324987;rs234985
rs5905002
rs32456;rs2349085

Upvotes: 0

Related Questions