Perler
Perler

Reputation: 43

How could I sort this hash array?

I would like to sort alphabetically the content of %hash{'name'} maintaining the correlation of the other elements of @{$hash{$keys}}.

How can I do that?

my %hash = (
    date => [
        qw(
            2018/01/12
            2018/03/01
            2018/03/20
            2018/04/04
        )
    ],
    time => [
        qw(
            03:00:02
            01:00:01
            00:24:39
            11:33:33            
        )
    ],
    name => [
        qw(
            jerry
            tom
            micky
            agata            
        )
    ]
);

Desired Output:

date;time;name
2018/04/04;11:33:33;agata
2018/01/12;03:00:02;jerry
2018/03/20;00:24:39;micky
2018/03/01;01:00:01;tom

I haven't tried anything yet, because I don't know where to start.

Upvotes: 2

Views: 162

Answers (1)

simbabque
simbabque

Reputation: 54333

I chose this solution because I would like to do another sort order by date in the script. With key reference it will be more smart...

This is the core part of your problem. Your thinking is correct, but you implemented it in the wrong way. That's what has put you into this corner now.

Let's take a look at your data first. You said it's a log file, so it's line-based. I've made this format up.

On 2018/01/12 at 03:00:02 user jerry did stuff.
On 2018/03/01 at 01:00:01 user tom did stuff.
On 2018/03/20 at 00:24:39 user micky did stuff.
On 2018/04/04 at 11:33:33 user agata did stuff.

And your expected output is a CSV file. Again, this is line-based.

date;time;name
2018/04/04;11:33:33;agata
2018/01/12;03:00:02;jerry
2018/03/20;00:24:39;micky
2018/03/01;01:00:01;tom

So it stands to reason that the structure you want the data to be in is still line-based.

When you want to sort this data by any of the columns, you are sorting rows based on values in the columns. So you really want to store the rows, in a way that makes it easy to access the values of each individual column for that row. This becomes especially clear once you look at it in a spreadsheet.

screenshot of libreoffice spreadsheet with the example data

Each of the columns is one value of a row. So let's do that.

my @events; # or something like that
while (my $row = <$log_fh>) {
    my ( $date, $time, $name ) = parse_row($row); # we don't care about this implementation

    push @events, {
        date => $date,
        time => $time,
        name => $name,
    };
}

Now we have this data structure (which I've output with Data::Printer).

[
    [0] {
        date   "2018/01/12",
        name   "jerry",
        time   "03:00:02"
    },
    [1] {
        date   "2018/03/01",
        name   "tom",
        time   "01:00:01"
    },
    [2] {
        date   "2018/03/20",
        name   "micky",
        time   "00:24:39"
    },
    [3] {
        date   "2018/04/04",
        name   "agata",
        time   "11:33:33"
    }
]

As you can see, there is one hash reference per line, and that contains a key for the date, one for the time and one for the name.

Now we can sort on any of the keys inside of those structures. That's easy.

my @events_by_name = sort { $a->{name} cmp $b->{name} } @events;
my @events_by_date = sort { $a->{date} cmp $b->{date} } @events;
my @events_by_time = sort { $a->{time} cmp $b->{time} } @events;

And then you can produce CSV files for each of them.

open my $fh, '>', 'events_by_name.csv' or die $!;
foreach my $event (@events_by_name) {
    print $fh join ';', $event->{name}, $event->{date}, $event->{time};
    print $fh "\n";
}
close $fh;

Or you could iterate with the number of events, open several at the same time and only loop once.

open my $fh_name, '>', 'events_by_name.csv' or die $!;
open my $fh_date, '>', 'events_by_date.csv' or die $!;
for (my $i = 0; $i < @events_by_name; $i++) {
    print $fh_name join(
         ';', 
         @events_by_name->[$i]->{name}, 
         @events_by_name->[$i]->{date}, 
         @events_by_name->[$i]->{time},
    ); 
    print $fh_name "\n";

    print $fh_date join(
         ';', 
         @events_by_name->[$i]->{name}, 
         @events_by_name->[$i]->{date}, 
         @events_by_name->[$i]->{time},
    ); 
    print $fh_date "\n";
}
close $fh_name;
close $fh_date;

You can further shorten this by using another loop.

open my $fh_name, '>', 'events_by_name.csv' or die $!;
open my $fh_date, '>', 'events_by_date.csv' or die $!;
for (my $i = 0; $i < @events_by_name; $i++) {
    foreach my $fh ($fh_name, $fh_date) {
        print $fh join(
             ';', 
             @events_by_name->[$i]->{name}, 
             @events_by_name->[$i]->{date}, 
             @events_by_name->[$i]->{time},
        ); 
        print $fh "\n";
    }
}
close $fh_name;
close $fh_date;

As you can see, it makes a lot more sense to keep the structure line-based when you are dealing with lines.

Upvotes: 3

Related Questions