pdubois
pdubois

Reputation: 7800

Parsing data with multiple record separator into hash of array of array

With this code and data:

use strict;
use warnings;
use Data::Dumper;

my %hash;
local $/ = '###############';

while (<DATA>) {
   chomp;
   my @items = split "\n"  or next;
   chomp(@items);
   print Dumper \@items;
}


__DATA__
###############
0 FOO 
entry 1003
entry 1001
entry 9999
---------------
entry 3333
entry 7777
###############
1 BAR 
entry 2222
entry 2001

I wish to create the following data structure:

   $VAR = {
             '0 FOO'  => [['entry 1001','entry 9999'],['entry 7777']],
             '1 BAR'  =>  [['entry 2001']]
           }

So:

  1. Each chunk ( hash key ) is delimited with "####"

  2. Skip 3rd line for every chunk (e.g. 1003 and 2222)

  3. Member of each inner array is separated by "---";

  4. For each hash only stores 2nd members onwards from blocked separated by "----";

What's the way to do it?

Running code here: https://eval.in/90200

Upvotes: 0

Views: 220

Answers (1)

Borodin
Borodin

Reputation: 126742

I find it is rarely helpful to read entire data blocks into memory at one time as you then only have to split them and remove newlines etc. It is usually easier to maintain some simple state data and process file line by line.

This code does what you need. It works by starting a new hash element whenever a line of hashes is found. The new key is copied into variable $key and the hash value is set to [ [ ] ]. The next data line is discarded.

Similarly, a line of hyphens adds another empty array onto the end of the current hash value, and the next data line is discarded.

Any other lines are just added to the last array in the current hash value.

I have used Data::Dump in preference to Data::Dumper as I find it produces far more readable output. However it is not a core module and you will probably need to install it if you wish to follow suit.

use strict;
use warnings;

use Data::Dump;

my %data;
my $key;
while (<DATA>) {

  chomp;

  if (/^#+$/) {
    chomp ($key = <DATA>);
    $data{$key} = [ [ ] ];
    <DATA>;
  }
  elsif (/^-+$/) {
    push @{ $data{$key} }, [ ];
    <DATA>;
  }
  else {
    push @{ $data{$key}[-1] }, $_;
  }
}

dd \%data;


__DATA__
###############
0 FOO 
entry 1003
entry 1001
entry 9999
---------------
entry 3333
entry 7777
###############
1 BAR 
entry 2222
entry 2001

output

{
  "0 FOO " => [["entry 1001", "entry 9999"], ["entry 7777"]],
  "1 BAR " => [["entry 2001"]],
}

Upvotes: 2

Related Questions