B Chen
B Chen

Reputation: 923

how to properly use arrays, as opposed to using dynamic scalar variables

I have this kind of data and need to (1) retrieve the first element (5 elements in each "number cluster", separated by ":") within each cluster and (2) group the retrieved elements every 3 elements per group.

chr1    69270   .   A   G   1/1:208,34:244:14.96:118,15,0   0/1:186,51:241:8.72:80,0,9  0/0:226,1:236:3.01:0,3,30   ./. 1/1:209,35:250:12:116,12,0  ./. 1/1:186,53:242:14.97:126,15,0   0/0:245,0:248:3.01:0,3,33   1/1:182,60:243:23.95:201,24,0

I am sure there are better way to do it. But as of now, I could only think of using brutal force, which is always bad. The other option is to use dynamic scalars but essentially dynamic scalars would do exactly what the bad code below does. I don't see much improvement and others at stackoverflow said it's (also) always bad to use dynamic scalars.

I am still reading perl beginning so don't know what other options are available. Any help will be appreciated.

my @genotype1 = split (/:/, $original_line[6]);
my @genotype2 = split (/:/, $original_line[7]);
my @genotype3 = split (/:/, $original_line[8]);
my @genotype4 = split (/:/, $original_line[9]);
my @genotype5 = split (/:/, $original_line[10]);
my @genotype6 = split (/:/, $original_line[11]);
my @genotype7 = split (/:/, $original_line[12]);
my @genotype8 = split (/:/, $original_line[13]);
my @genotype9 = split (/:/, $original_line[14]);
my @trio1 = ($genotype1[0], $genotype2[0], $genotype3[0]);
my @trio2 = ($genotype4[0], $genotype5[0], $genotype6[0]);
my @trio3 = ($genotype7[0], $genotype8[0], $genotype9[0]);  

Upvotes: 0

Views: 86

Answers (2)

ikegami
ikegami

Reputation: 385897

If you were using "dynamic variables", you would have

for (6..14) {
   @{ "genotype".($i-6) } = split (/:/, $original_line[$i]);
}

Just change it to

my @genotypes;
for (6..14) {
   @{ $genotypes[$i-6] } = split (/:/, $original_line[$i]);
}

which might be a bit cleaner as

my @genotypes;
for (6..14) {
   $genotypes[$i-6] = [ split (/:/, $original_line[$i]) ];
}

or

my @genotypes;
for (6..14) {
   push @genotypes, [ split (/:/, $original_line[$i]) ];
}

or

my @genotypes;
for (@original_line[6..14]) {
   push @genotypes, [ split /:/ ];
}

or

my @genotypes = map { [ split /:/ ] } @original_line[6..14];

But you only need the first element, so you can use

my @genotypes = map { ( split /:/ )[0] } @original_line[6..14];

Then, all you need is to grab three elements from that array at a time, so you get:

my @genotypes = map { ( split /:/ )[0] } @original_line[6..14];

my @trioes;
while (@genotypes) {
   push @trios, [ splice @genotypes, 0, 3 ];
}

Upvotes: 3

jwodder
jwodder

Reputation: 57490

There are several different ways to make your code more efficient; most (all?) would take advantage of the fact that you're only ever using the first element of each @genotype list. One example:

my @elements = map { (split /:/)[0] } @original_line[6..14];
my @trio1 = @elements[0,1,2];
my @trio2 = @elements[3,4,5];
my @trio3 = @elements[6,7,8];

Upvotes: 3

Related Questions