Reputation: 19
How do I read a file into an array, but correctly handle duplicates?
I have a file consisting of two columns, a name and a number. Eventually the names will repeat, with a number that may or may not be different.
Rita,13
Sue,11
Bob,01
Too,05
Rita,13
Sue,07
Bob,02
Too,05
I need to read these lines into an array which is not a problem, but repopulate them in a way so that any duplicate name has its value pushed to correct line, which is trickier. For me, at least.
So above should create something like
Rita,13,13
Sue,11,07
Bob,01,02
Too,05,05
There are about 3000 names, and about 600,000 lines to process. (The idea is to highlight which names are stable and which have changing values).
Speed does not matter too much. This will be run about once a week.
Because each line will end up with multiple entries, and the 2nd entry from each line does not matter too much (I need only to read it and populate it to new list), I am thinking I do not need to use a hash here, and I should just iterate through the input file with some form of if exists (or not). Am I right or would a hash be beneficial?
I am using Strawberry Perl V5.32.1 on Windows.
EDIT - thanks for samples, all worked great.
Due to a change in the output, the input file now has extra columns which must remain.
So now it looks like
12,Rita,1,4,13
2,Sue,0,1,11
5,Bob,12,5,01
7,Too,1,4,05
12,Rita,1,4,13
2,Sue,0,1,07
5,Bob,12,5,02
7,Too,1,4,05
and output would be similar, in that only last column will change
12,Rita,1,4,13,13
2,Sue,0,1,11,07
5,Bob,12,5,01,02
7,Too,1,4,05,05
The extra columns will not change but must be there. Does it still make sense to use an array and pluck the 2nd and 5th column to create it, or change the delimiter for first 4 columns so that first column becomes the unique key? That feels dirty but would work...
Upvotes: 0
Views: 106
Reputation: 67900
The basic solution to your problem has been given, but two things were overlooked: 1) the original order of the names cannot be preserved with a hash, 2) if any line has multiple values, the additional values are discarded. So I came up with this:
use strict;
use warnings;
use feature 'say';
my %d;
my @keys;
while (<DATA>) {
chomp;
my ($key, @val) = split /,/, $_; # use array to store multiple values
push @keys, $key if not defined $d{$key}; # preserve original order of lines
push @{ $d{$key} }, @val; # store values
}
say join ",", $_, @{ $d{$_} } for @keys; # print lines in original order
__DATA__
Rita,13
Sue,11
Bob,01,23
Too,05
Rita,13
Sue,07
Bob,02
Too,05
Output:
Rita,13,13
Sue,11,07
Bob,01,23,02
Too,05,05
Upvotes: 4
Reputation: 6798
Use of hash with username as a key and id stored in an array make this task easy to implement.
Please investigate following code snippet.
use warnings;
use feature 'say';
my $data;
while( <DATA> ) {
chomp;
my($name,$id) = split /,/;
push @{$data->{$name}}, $id;
}
say join(',',$_,@{$data->{$_}}) for sort keys %$data;
exit 0;
__DATA__
Rita,13
Sue,11
Bob,01
Too,05
Rita,13
Sue,07
Bob,02
Too,05
Output
Bob,01,02
Rita,13,13
Sue,11,07
Too,05,05
Upvotes: 4
Reputation: 241828
As a one-liner
perl -F, -lane '$h{ $F[0] } .= ",$F[1]"; END {print $_, $h{$_} for keys %h}' -- file
As a script:
#!/usr/bin/perl
use warnings;
use strict;
my %h;
while (<>) {
chomp;
my ($name, $number) = split /,/;
$h{$name} .= ",$number";
}
for my $name (keys %h) {
print $name, $h{$name}, "\n";
}
Upvotes: 4