Reputation: 95
I was wondering if you could help me with a coding problem which I can't get my head around. The tab-delimited data I have looks like something like the following:
00001 AU:137 AU:150 AU:180
00001 AU:137 AU:170
00002 AU:180
00003 AU:147 AU:155
00003 AU:155
The output I want is:
00001 AU:137 AU:150 AU:180 AU:170
00002 AU:180
00003 AU:147 AU:155
So the first column (identifier) will merge the values, removing duplicates, so that it becomes a hash. I'm not sure how to work with my current data because it can't be a hash having duplicate keys. I'm also not sure how to push the data into an array if the identifier is the same.
I apologize for not having a code. I did try a few, actually, quite a lot, but they don't look right even to a newbie like myself.
Any help, suggestions would be greatly appreciated and thank you so much for your time and answer. I greatly appreciate it.
Upvotes: 4
Views: 1513
Reputation: 43683
Script:
#!/usr/bin/perl
use strict;
use warnings;
my %hash;
sub uniq { return keys %{{map {$_=>1} @_}}; }
open my $fh, '<input.txt' or die $!;
foreach (<$fh>) {
$hash{$1} .= $2 if /^(\S+)(\s.*?)[\n\r]*$/;
}
close $fh;
foreach (sort keys %hash) {
my @elements = uniq split /\t/, $hash{$_};
print "$_\t", join(' ', sort @elements), "\n";
}
Output:
00001 AU:137 AU:150 AU:170 AU:180
00002 AU:180
00003 AU:147 AU:155
Upvotes: 0
Reputation: 126732
The classical solution to this uses a hash; in fact a hash of hashes, as there are duplicate line numbers as well as duplicate values per line.
This program produces the output you need. It expects the data file to be passed on the command line.
use strict;
use warnings;
my %data;
while (<>) {
chomp;
my ($key, @items) = split /\t/;
$data{$key}{$_}++ for @items;
}
print join("\t", $_, sort keys %{$data{$_}}), "\n" for sort keys %data;
output
00001 AU:137 AU:150 AU:170 AU:180
00002 AU:180
00003 AU:147 AU:155
Or if you prefer a command-line solution
perl -aF/\t/ -lne'$k=shift @F; $d{$k}{$_}++ for @F; END{print join "\t", $_, sort keys %{$d{$_}} for sort keys %d}' myfile
(It may need a little tweaking as I can only test on Windows at present.)
Upvotes: 3
Reputation: 8386
I hope this gives some idea to solve your problem:
use strict;
use warnings;
use Data::Dumper;
my %hash = ();
while (<DATA>) {
chomp;
my (@row) = split(/\s+/);
my $firstkey = shift @row;
foreach my $secondkey (@row) {
$hash{$firstkey}{$secondkey}++;
}
}
print Dumper \%hash;
__DATA__
00001 AU:137 AU:150 AU:180
00001 AU:137 AU:170
00002 AU:180
00003 AU:147 AU:155
00003 AU:15
Upvotes: 3