Reading CSV-ish type records of varying length from a file

Question

The file looks like this:

Nolan, Randall|(XYZ) {
  Bronco,
  Patient,
  New,
}
Tryor, Neil|(ABC) {
  Doyle,
  Agg,
}
Daniel, Liam|(ABC)
Taylor, Greg|(XYZ)

Notes about the records: The last two lines above are what constitutes the ID of a record, which is of the form: Last_name, First_name|(CODE). Optionally, each one of these records could have a pair of braces. Inside these braces there will be items separated by a comma and put each on their own line for better layout while viewing the file. The separator between these items will remain the comma. The absence of any items of this type will mean the lack of any braces after the respective record (like the last two records above). If the braces are present, this means that there are between 1 and n items there, with n>=1. (first two records in the example)

What I want to do is take for every person, all of their information and process it this way:

Take all the mandatory information, meaning the fields before the braces (guaranteed to be there and consisting of the three fields) and use that as an ID for what's next to come, which is the items that are grouped in between the braces.
The fields before the braces are not to be grouped together, so they will be each split into: Last_name, First_name, Code.

The problem is that I don't know how many items there are in between the braces for each record and I would like to use all the fields for one record in some further processing I want to do with what I extract from the file.

A solution that I thought of is to have an array of hashes like this:

my $records = [
    'First_Name' => 'Bob',
    'Last_Name' => 'Dolan',
    'Code' => 'XYZ',
    Items => [item1,
              item2,
              ... ]
    ]

So this way I'd have all the data I want in one place but I would need to iterate through the data structure and process it that way.

That seems like a primitive solution. What would be a better solution where I could process the data on the fly, as I am reading it from the file while having the necessary checks and validation on the information in the optional braces section?

ThisSuitIsBlackNot · Accepted Answer

The following reads the data line by line, storing it in a hash as it goes. When the end of a record is reached (either ) or } at the end of a line), you can process the hash (I simply print it with Data::Dumper).

#!/usr/bin/perl

use strict;
use warnings;

use Data::Dumper;

my %record;
while () {
    chomp;

    # Assumes that none of the lines inside braces can contain "|"   
    if (/\|/) {
        my ($name, $code) = split /\|/;
        my ($last, $first) = split /,\s*/, $name;
        ($code) = ($code =~ /$(.*)$/g);

        $record{first} = $first;
        $record{last} = $last;
        $record{code} = $code;
    }
    elsif (/,$/) {
        s/\s+//g;
        s/,//g;
        push @{ $record{items} }, $_;
    }

    # End of record, process it
    if (/[})]$/) {
        print Dumper(\%record);

        # Clear record after processing
        %record = ();
    }
}

__DATA__
Nolan, Randall|(XYZ) {
  Bronco,
  Patient,
  New,
}
Tryor, Neil|(ABC) {
  Doyle,
  Agg,
}
Daniel, Liam|(ABC)
Taylor, Greg|(XYZ)

Output:

$VAR1 = {
          'first' => 'Randall',
          'last' => 'Nolan',
          'code' => 'XYZ',
          'items' => [
                       'Bronco',
                       'Patient',
                       'New'
                     ]
        };
$VAR1 = {
          'first' => 'Neil',
          'last' => 'Tryor',
          'code' => 'ABC',
          'items' => [
                       'Doyle',
                       'Agg'
                     ]
        };
$VAR1 = {
          'first' => 'Liam',
          'last' => 'Daniel',
          'code' => 'ABC'
        };
$VAR1 = {
          'first' => 'Greg',
          'last' => 'Taylor',
          'code' => 'XYZ'
        };

Reading CSV-ish type records of varying length from a file

Answers (1)

Output:

Related Questions