shaq
shaq

Reputation: 849

grab multiple lines after a matching line in Perl iteratively?

I want a hash structure, each line starting with a > is the key and the lines up to the next > are the value for that key:

while (<DATA>) {
    $line1 = $_;
    chomp($line1);

    if ($line1 =~ /^>/) {
        while (<DATA>) {
            last if $line1 =~ /^>/;
            $value .= $_;
        }
        $hash{$line1} = $value;
    }
}

foreach my $key(%hash) {
    print "$key :$hash{$key}\n";
}

__DATA__
>label1 
line1\n
line2\n
>label2
line1\n
line2\n

Upvotes: 3

Views: 687

Answers (2)

memowe
memowe

Reputation: 2668

I think using a key variable with outer scope is the easiest way:

Code

#!/usr/bin/env perl

use strict;
use warnings;
use feature 'say';

my $key = 'unlabeled';
my %value;

while (defined(my $line = <DATA>)) {
    chomp $line;

    # key line
    if ($line =~ /^>(.*)/) {
        $key = $1;
        next;
    }

    # value line
    push @{$value{$key}}, $line;
}

say "$_: " . join ', ' => @{$value{$_}} for keys %value;

__DATA__
>label1
line1
line2
>label2
line1
line2

Output

label1: line1, line2
label2: line1, line2

Explanation

We save the last seen hash key in the variable $key scoped outside the line iteration loop. When the loop sees a next key line, it replaces the $key string. If no array for the new key exists, perl's autovivification mechanism creates a new empty array inside the hash. If there's already something in the hash under $key, new lines will be added after the existing lines.

Edit: if your input doesn't start with a key line, the first non-key lines will be added to the key unlabeled.

Edit 2: if you want the lines for a label concatenated, this can easily done by join "\n" => @{$value{$_}}. If you want it like that only you can throw the array stuff away and use @perreal's solution which is very similar but just concatenates new lines.

Edit 3: yeah, autovivification! :) (Thanks @TLP)

Upvotes: 5

perreal
perreal

Reputation: 97968

use warnings;
use strict;

my %hash;
my $key;

foreach (<DATA>) {
  chomp; 
  if (/^>(.*)$/) { $key = $1; } 
  else           { $hash{$key} .= $_} 
}

foreach $key(keys %hash) {
  print "$key :$hash{$key}\n";
}

__DATA__
>label1 
line1\n
line2\n
>label2
line1\n
line2\n

Upvotes: 1

Related Questions