mropa
mropa

Reputation: 11842

Transforming irregular data into a structured table with Perl

I would like to transform the following lines of data into a structured table with perl:

3=884|4=884|5=MBREFUSDCMR|25=113500|824=20120229|387=4.15|248=TLD|280=5
3=884|4=884|5=MBREFUSDCMR|24=1|12=0|10=0|25=113500|824=20120229|280=5
3=884|4=884|5=MBREFUSDCMR|24=1|270=252304|280=5

Each value before the equal sign represents the identifier of the corresponding value. Now I like to produce a table for further analysis. The first row should include the identifiers and below their values. There is an unequal amount of identifiers in the three rows which have to be kept in mind. Missing values should be filled with the string "NA". So it should look like this

3|4|5|24|12|10|25|824|387|248|270|280
884|884|MBREFUSDCMR|NA|NA|NA|113500|20120229|4.15|TLD|NA|5
884|884|MBREFUSDCMR|1|0|0|113500|20120229|NA|NA|NA|5
884|884|MBREFUSDCMR|1|NA|NA|NA|NA|NA|NA|252304|5

@DVK

My approach was to first get all the variables. That would be the header/first row for the table. E.g.

my @data_dirty = <STDIN>;
chomp(@data_dirty);

## get the columns names
my ( @tmp, @var );
foreach my $j (@data_dirty) {
    foreach my $i (split /\|/, $j) {
    $i =~ s/\[.*\]//g;
    $i =~ s/\s+//g;
    $i =~ s/(.*)=.*/$1/g;
    push(@tmp, $i);
    }
}
@var = uniq @tmp;

Afterwards I might check each row if the variable in @var exists, if yes write the corresponding value otherwise write an "NA". However, I had some troubles with the checks and the correct storing of the data so that afterwards the output looks as desired.

Upvotes: 2

Views: 150

Answers (1)

Zaid
Zaid

Reputation: 37146

The key to addressing such needs is to divide and conquer. A careful choice of subroutines is all you need.

Here, data needs to be loaded and all identifiers need to be known a priori before any table is printed.

The example below doesn't order the columns by their sequence of appearance (left as an exercise for the reader):

use strict;
use warnings;

my ( $data, $headers ) = load_data( 'tabular_data.txt' );

print_tabular( $data, $headers );

sub load_data {

    my ( $file ) = @_;

    open my $fh, '<', $file or die $!;

    my ( @records, %fields );
    while ( my $line = <$fh> ) {

        chomp $line;

        my @columns = split /\|/, $line;               # Get columns
        my %entries = map { split /=/, $_ } @columns;  # Populate record,
                                                       # keys = fields
        push @records, \%entries;                      # Add to data collection

        $fields{$_}++ for keys %entries;               # Detect new headers
    }

    close $fh;

    return ( \@records, [ keys %fields ] );
}

sub print_tabular {

    my ( $data, $headers ) = @_;

    pretty_print( @$headers );

    for my $record ( @$data ) {
        my @values = map { exists $record->{$_}  # exists check needed...
                             ? $record->{$_}     # ... otherwise header '0'
                             : 'NA'              # ... would always print 'NA'
                         } @$headers;
        pretty_print( @values );
    }
}

sub pretty_print { print join( '|', @_ ), "\n" }

Upvotes: 2

Related Questions