gagneet
gagneet

Reputation: 37259

How can I convert these strings to a hash in Perl?

I wish to convert a single string with multiple delimiters into a key=>value hash structure. Is there a simple way to accomplish this? My current implementation is:

sub readConfigFile() {
    my %CONFIG;
    my $index = 0;
    open(CON_FILE, "config");
    my @lines = <CON_FILE>;
    close(CON_FILE);

    my @array = split(/>/, $lines[0]);
    my $total = @array;

    while($index < $total) {
        my @arr = split(/=/, $array[$index]); 
        chomp($arr[1]);
        $CONFIG{$arr[0]} = $arr[1];       
        $index = $index + 1; 
    }

    while ( ($k,$v) = each %CONFIG ) {
        print "$k => $v\n";
    }

    return;
}

where 'config' contains:

pub=3>rec=0>size=3>adv=1234 123 4.5 6.00
pub=1>rec=1>size=2>adv=111 22 3456 .76

The last digits need to be also removed, and kept in a separate key=>value pair whose name can be 'ip'. (I have not been able to accomplish this without making the code too lengthy and complicated).

Upvotes: 3

Views: 8958

Answers (5)

brian d foy
brian d foy

Reputation: 132802

What is your configuration data structure supposed to look like? So far the solutions only record the last line because they are stomping on the same hash keys every time they add a record.

Here's something that might get you closer, but you still need to figure out what the data structure should be.

  • I pass in the file handle as an argument so my subroutine isn't tied to a particular way of getting the data. It can be from a file, a string, a socket, or even the stuff below __DATA__ in this case.

  • Instead of fixing things up after I parse the string, I fix the string to have the "ip" element before I parse it. Once I do that, the "ip" element isn't a special case and it's just a matter of a double split. This is a very important technique to save a lot of work and code.

  • I create a hash reference inside the subroutine and return that hash reference when I'm done. I don't need a global variable. :)

use warnings;
use strict;

use Data::Dumper;

readConfigFile( \*DATA );

sub readConfigFile {
    my( $fh ) = shift;
    
    my $hash = {};
    
    while( <$fh> ) {
        chomp;
        
        s/\s+(\d*\.\d+)$/>ip=$1/;

        $hash->{ $. } = { map { split /=/ } split />/ };
        }
        
    return $hash;
    }

my $hash = readConfigFile( \*DATA );

print Dumper( $hash );

__DATA__
pub=3>rec=0>size=3>adv=1234 123 4.5 6.00
pub=1>rec=1>size=2>adv=111 22 3456 .76

This gives you a data structure where each line is a separate record. I choose the line number of the record ($.) as the top-level key, but you can use anything that you like.

$VAR1 = {
          '1' => {
                   'ip' => '6.00',
                   'rec' => '0',
                   'adv' => '1234 123 4.5',
                   'pub' => '3',
                   'size' => '3'
                 },
          '2' => {
                   'ip' => '.76',
                   'rec' => '1',
                   'adv' => '111 22 3456',
                   'pub' => '1',
                   'size' => '2'
                 }
        };

If that's not the structure you want, show us what you'd like to end up with and we can adjust our answers.

Upvotes: 4

Jonathan Leffler
Jonathan Leffler

Reputation: 753695

The config file format is sub-optimal, shall we say. That is, there are easier formats to parse and understand. [Added: but the format is already defined by another program. Perl is flexible enough to deal with that.]

Your code slurps the file when there is no real need.

Your code only pays attention to the last line of data in the file (as Chris Charley noted while I was typing this up).

You also have not allowed for comment lines or blank lines - both are a good idea in any config file and they are easy to support. [Added: again, with the pre-defined format, this is barely relevant, but when you design your own files, do remember it.]

Here's an adaptation of your function into somewhat more idiomatic Perl.

#!/bin/perl -w
use strict;
use constant debug => 0;

sub readConfigFile()
{
    my %CONFIG;
    open(CON_FILE, "config") or die "failed to open file ($!)\n";

    while (my $line = <CON_FILE>)
    {
        chomp $line;
        $line =~ s/#.*//;           # Remove comments
        next if $line =~ /^\s*$/;   # Ignore blank lines

        foreach my $field (split(/>/, $line))
        {
            my @arr = split(/=/, $field);
            $CONFIG{$arr[0]} = $arr[1];
            print ":: $arr[0] => $arr[1]\n" if debug;
        }
    }
    close(CON_FILE);

    while (my($k,$v) = each %CONFIG)
    {
        print "$k => $v\n";
    }
    return %CONFIG;
}

readConfigFile;    # Ignores returned hash

Now, you need to explain more clearly what the structure of the last field is, and why you have an 'ip' field without the key=value notation. Consistency makes life easier for everybody. You also need to think about how multiple lines are supposed to be handled. And I'd explore using a more orthodox notation, such as:

pub=3;rec=0;size=3;adv=(1234,123,4.5);ip=6.00

Colon or semi-colon as delimiters are fairly conventional; parentheses around comma separated items in a list are not an outrageous convention. Consistency is paramount. Emerson said "A foolish consistency is the hobgoblin of little minds, adored by little statesmen and philosophers and divines", but consistency in Computer Science is a great benefit to everyone.

Upvotes: 1

Chris Charley
Chris Charley

Reputation: 6573

I am assuming that you want to read and parse more than 1 line. So, I chose to store the values in an AoH.

#!/usr/bin/perl
use strict;
use warnings;

my @config;

while (<DATA>) {
    chomp;
    push @config, { split /[=>]/ };
}

for my $href (@config) {
    while (my ($k, $v) = each %$href) {
        print "$k => $v\n";
    }
}

__DATA__
pub=3>rec=0>size=3>adv=1234 123 4.5 6.00
pub=1>rec=1>size=2>adv=111 22 3456 .76

This results in the printout below. (The while loop above reads from DATA.)

rec => 0
adv => 1234 123 4.5 6.00
pub => 3
size => 3
rec => 1
adv => 111 22 3456 .76
pub => 1
size => 2

Chris

Upvotes: 2

Altreus
Altreus

Reputation: 6219

The below assumes the delimiter is guaranteed to be a >, and there is no chance of that appearing in the data.

I simply split each line based on '>'. The last value will contain a key=value pair, then a space, then the IP, so split this on / / exactly once (limit 2) and you get the k=v and the IP. Save the IP to the hash and keep the k=v pair in the array, then go through the array and split k=v on '='.

Fill in the hashref and push it to your higher-scoped array. This will then contain your hashrefs when finished.

(Having loaded the config into an array)

my @hashes;

for my $line (@config) {
    my $hash; # config line will end up here

    my @pairs = split />/, $line;

    # Do the ip first. Split the last element of @pairs and put the second half into the
    # hash, overwriting the element with the first half at the same time.
    # This means we don't have to do anything special with the for loop below.
    ($pairs[-1], $hash->{ip}) = (split / /, $pairs[-1], 2);

    for (@pairs) {
        my ($k, $v) = split /=/;
        $hash->{$k} = $v;
    }

    push @hashes, $hash;
}

Upvotes: 1

Martin Redmond
Martin Redmond

Reputation: 13986

Here's one way.


foreach ( @lines ) {
  chomp;
  my %CONFIG;
  # Extract the last digit first and replace it with an end of
  # pair delimiter.
  s/\s*([\d\.]+)\s*$/>/;
  $CONFIG{ip} = $1;
  while ( /([^=]*)=([^>]*)>/g ) {
    $CONFIG{$1} = $2;
  }
  print Dumper ( \%CONFIG );
}

Upvotes: 0

Related Questions