Parsing a structured text file in Perl

Question

I'm quite new to Perl and I'm having immense difficulty writing a Perl script that will successfully parse a structured text file.

I have a collection of files that look like this:

name:
    John Smith
occupation:
    Electrician
date of birth:
    2/6/1961
hobbies:
    Boating
    Camping
    Fishing

And so on. The field name is always followed by a colon, and all the data associated with those fields is always indented by a single tab ( ).

I would like to create a hash that will directly associate the field contents with the field name, like this:

 $contents{$name} = "John Smith"
 $contents{$hobbies} = "Boating, Camping, Fishing"

Or something along those lines.

So far I've been able to get all the field names into a hash by themselves, but I've not had any luck wrangling the field data into a form that can be nicely stored in a hash. Clearly substituting/splitting newlines followed by tabs won't work (I've tried, somewhat naively). I've also tried a crude lookahead where I create a duplicate array of lines from the file and using that to figure out where the field boundaries are, but it's not that great in terms of memory consumption.

FWIW, currently I'm going through the file line by line, but I'm not entirely convinced that this is the best solution. Is there any way to do this parsing in a straightforward manner?

hmatt1 · Accepted Answer

Reading the file line by line is a good way to go. Here I am creating a hash of array references. This is how you would just read one file. You could read each file this way and put the hash of arrays into a hash of hashes of array.

#!/usr/bin/perl

use strict;
use warnings;
use Data::Dumper;

my %contents;
my $key;
while(){
    chomp;
    if ( s/:\s*$// ) {
        $key = $_;
    } else {
        s/^\s+//g; # remove extra whitespace
        push @{$contents{$key}}, $_;
    }
}
print Dumper \%contents;

__DATA__
name:
    John Smith
occupation:
    Electrician
date of birth:
    2/6/1961
hobbies:
    Boating
    Camping
    Fishing

Output:

$VAR1 = {
          'occupation' => [
                             'Electrician'
                           ],
          'hobbies' => [
                          'Boating',
                          'Camping',
                          'Fishing'
                        ],
          'name' => [
                       'JohnSmith'
                     ],
          'date of birth' => [
                                '2/6/1961'
                              ]
        };

Parsing a structured text file in Perl

Answers (2)

Related Questions