user3046061
user3046061

Reputation: 353

Reading CSV-ish type records of varying length from a file

The file looks like this:

Nolan, Randall|(XYZ) {
  Bronco,
  Patient,
  New,
}
Tryor, Neil|(ABC) {
  Doyle,
  Agg,
}
Daniel, Liam|(ABC)
Taylor, Greg|(XYZ)

Notes about the records: The last two lines above are what constitutes the ID of a record, which is of the form: Last_name, First_name|(CODE). Optionally, each one of these records could have a pair of braces. Inside these braces there will be items separated by a comma and put each on their own line for better layout while viewing the file. The separator between these items will remain the comma. The absence of any items of this type will mean the lack of any braces after the respective record (like the last two records above). If the braces are present, this means that there are between 1 and n items there, with n>=1. (first two records in the example)

What I want to do is take for every person, all of their information and process it this way:

The problem is that I don't know how many items there are in between the braces for each record and I would like to use all the fields for one record in some further processing I want to do with what I extract from the file.

A solution that I thought of is to have an array of hashes like this:

my $records = [
    'First_Name' => 'Bob',
    'Last_Name' => 'Dolan',
    'Code' => 'XYZ',
    Items => [item1,
              item2,
              ... ]
    ]

So this way I'd have all the data I want in one place but I would need to iterate through the data structure and process it that way.

That seems like a primitive solution. What would be a better solution where I could process the data on the fly, as I am reading it from the file while having the necessary checks and validation on the information in the optional braces section?

Upvotes: 0

Views: 73

Answers (1)

ThisSuitIsBlackNot
ThisSuitIsBlackNot

Reputation: 24073

The following reads the data line by line, storing it in a hash as it goes. When the end of a record is reached (either ) or } at the end of a line), you can process the hash (I simply print it with Data::Dumper).

#!/usr/bin/perl

use strict;
use warnings;

use Data::Dumper;

my %record;
while (<DATA>) {
    chomp;

    # Assumes that none of the lines inside braces can contain "|"   
    if (/\|/) {
        my ($name, $code) = split /\|/;
        my ($last, $first) = split /,\s*/, $name;
        ($code) = ($code =~ /\((.*)\)/g);

        $record{first} = $first;
        $record{last} = $last;
        $record{code} = $code;
    }
    elsif (/,$/) {
        s/\s+//g;
        s/,//g;
        push @{ $record{items} }, $_;
    }

    # End of record, process it
    if (/[})]$/) {
        print Dumper(\%record);

        # Clear record after processing
        %record = ();
    }
}

__DATA__
Nolan, Randall|(XYZ) {
  Bronco,
  Patient,
  New,
}
Tryor, Neil|(ABC) {
  Doyle,
  Agg,
}
Daniel, Liam|(ABC)
Taylor, Greg|(XYZ)

Output:

$VAR1 = {
          'first' => 'Randall',
          'last' => 'Nolan',
          'code' => 'XYZ',
          'items' => [
                       'Bronco',
                       'Patient',
                       'New'
                     ]
        };
$VAR1 = {
          'first' => 'Neil',
          'last' => 'Tryor',
          'code' => 'ABC',
          'items' => [
                       'Doyle',
                       'Agg'
                     ]
        };
$VAR1 = {
          'first' => 'Liam',
          'last' => 'Daniel',
          'code' => 'ABC'
        };
$VAR1 = {
          'first' => 'Greg',
          'last' => 'Taylor',
          'code' => 'XYZ'
        };

Upvotes: 1

Related Questions