Paul Serikov
Paul Serikov

Reputation: 3121

Sort table (or 2-dimensional array) by same values in column

Example.

If input is

enter image description here

Output should be

enter image description here

So each row must consist only same values or undef if original unsorted column had no such string. Values in columns should be alphabetically sorted.

How to realize that type of sorting ?

P.S. Original task - we have some modules and we want to compare them visually for functions with similar names.

Upvotes: 1

Views: 157

Answers (4)

zdim
zdim

Reputation: 66901

A little manual but hopefully clear approach to "fill in" the missing spots: Gather a sorted reference of all values and use it to "pad" (with empty) for missing elements in each arrayref-column

use warnings;
use strict;
use feature 'say';
use Data::Dump qw(dd);
use List::MoreUtils qw(uniq);

my @data = (
    [ qw(abc def ghi xyz) ],
    [ qw(def jkl mno uvw xyz) ],
    [ qw(abc uvw xyz) ]
);    
my @all = uniq sort { $a cmp $b } map { @$_ } @data;  # reference

# Changes @data in place. Use on deep copy to preserve the original
for my $ary (@data) {
    my $cmp_at = 0;
    my @res;
    for my $i (0..$#all) {
        if ($ary->[$cmp_at] eq $all[$i]) {
            push @res, $ary->[$cmp_at];
            ++$cmp_at;
        }
        else {
            push @res, undef;
        }
    }
    $ary = \@res;  # overwrite arrayref in @data
}

dd \@data;

Another way is to go row-wise and inspect and print for each element, again using a reference list. The column-wise preprocessing used here should be more flexible and general, I think.

The above prints

[
  ["abc", "def", "ghi", undef, undef, undef, "xyz"],
  [undef, "def", undef, "jkl", "mno", "uvw", "xyz"],
  ["abc", undef, undef, undef, undef, "uvw", "xyz"],
]

where string "undef" produced by Data::Dump indicates array entries with nothing.


Now @data can be used to print in a chosen format, for example

use List::Util qw(max);

my $fmt = '%' . (max map { length } @all) . 's';
say join "\t", map { sprintf $fmt, $_ } qw(Foo Bar Baz);    
for my $i (0..$#{$data[0]}) {
    say join "\t", map { sprintf $fmt, $_->[$i]//'' } @data;
}

what prints a table in the fashion of the desired "output." For quick alignment I use the width of the longest word for all fields; this is most easily improved by using a text-formatting module.

If this goes to a spreadsheet, like tables in the question, then just join fields with a comma

open my $fh, '>', 'out.csv' or die "Can't open: $!";
say $fh join ',', qw(Foo Bar Baz);
foreach my $i (0..$#{$data[0]}) {
    say $fh join ',', map { $_->[$i]//'' } @data;
}

Upvotes: 2

Veltro
Veltro

Reputation: 33

Just another different solution using hashes and arrays and Hash::Merge. Looking at your example I concluded that each value from the table can be stored as a single hash entry as long as you keep track in which column they belong. After the unsorted hash has been created the program finishes with a example print algorithm to extract the data in the format that you want.

use warnings;
use strict;
use Hash::Merge ;

my @data = (
    { name => 'Foo', funcs => [qw/abc def ghi xyz/] },
    { name => 'Bar', funcs => [qw/def jkl mno uvw xyz/] },
    { name => 'Baz', funcs => [qw/abc uvw xyz/] },
);

my $merger = Hash::Merge->new('RETAINMENT_PRECEDENT');
my $unsorted = {} ;
for my $i ( 0..$#data) {
    my $tmpH = {} ;
    foreach( @{$data[$i]->{funcs}} ) {
        if( exists $tmpH->{ $_ } ) {
            push @{$tmpH->{ $_ }}, $i ;
        } else {
            $tmpH->{ $_ } = [ $i ] ;
        }
    } ;
    $unsorted = $merger->merge( $unsorted, $tmpH ) ;
}
print "Foo\tBar\tBaz\n" ;
foreach ( sort keys %{$unsorted} ) {
    my @txt;
    @txt[@{$unsorted->{ $_ }}] = ($_) x @{$unsorted->{ $_ }} ;
    {
        no warnings 'uninitialized';
        printf("%s\t%s\t%s\n", $txt[0], $txt[1], $txt[2]) ; 
    }
}
__END__
Foo     Bar     Baz
abc             abc
def     def
ghi
        jkl
        mno
        uvw     uvw
xyz     xyz     xyz

Upvotes: 1

Stefan Becker
Stefan Becker

Reputation: 5962

The following code should do it, output format is CSV:

#!/usr/bin/perl
use strict;
use warnings;

use Text::CSV_XS qw( );

my $csv = Text::CSV_XS->new({ binary => 1, auto_diag => 2 });

my @headers = qw(Foo Bar Baz);
my @columns = (
    [qw/abc def ghi xyz/],
    [qw/def jkl mno uvw xyz/],
    [qw/abc uvw xyz/],
);

my %output;
my $N       = scalar(@columns);
my @new_row = ('') x $N;

foreach my $index (0..$N-1) {
    my $column = $columns[$index];

    foreach my $key (@{ $column }) {
        $output{$key} ||= [ @new_row ];
        $output{$key}->[$index] = $key;
    }
}

$csv->say(\*STDOUT, \@headers);

# NOTE: add {....} after sort for special sorting requirements
foreach my $key (sort keys %output) {
    $csv->say(\*STDOUT, $output{$key});
}

Example output:

$ perl dummy.pl
Foo,Bar,Baz,
abc,,abc,
def,def,,
ghi,,,
,jkl,,
,mno,,
,uvw,uvw,
xyz,xyz,xyz,

NOTE: the above is with the default sort algorithm, but you can insert your own, e.g.

# reverse sort
foreach my $key (sort { $b cmp $a } keys %output) {

Thanks for the question. This was fun :-)


EDIT2: guessing from the format of the question the input might also be in CSV, so a row-based variant of the algorithm might be more appropriate.

#!/usr/bin/perl
use strict;
use warnings;

use Text::CSV_XS qw( );

my $csv = Text::CSV_XS->new({ binary => 1, auto_diag => 2 });

my $headers;
my @new_row;

my %keys;
my $line = 0;
while ( my $row = $csv->getline(\*STDIN) ) {
    if ($line == 0) {
        $headers = $row;
        @new_row = ('') x @$row;
    } else {
        foreach my $index (0..$#$row) {
            my $key = $row->[$index];
            $keys{$key} ||= [ @new_row ];
            $keys{$key}->[$index] = $key;
        }
    }

    $line++;
}

# delete "undefined" key
delete $keys{''};

$csv->say(\*STDOUT, $headers);
# NOTE: add {....} after sort for special sorting requirements
$csv->say(\*STDOUT, $keys{$_}) foreach (sort keys %keys);

Example output:

$ cat input.csv 
Foo,Bar,Baz
abc,def,abc
def,jkl,uvw
ghi,mno,xyz
xyz,uvw,
,xyz,

$ perl dummy.pl <input.csv 
Foo,Bar,Baz
abc,,abc
def,def,
ghi,,
,jkl,
,mno,
,uvw,uvw
xyz,xyz,xyz

Upvotes: 1

haukex
haukex

Reputation: 3013

Something like this maybe?

use warnings;
use strict;

my @data = (
    { name => 'Foo', funcs => [qw/abc def ghi xyz/] },
    { name => 'Bar', funcs => [qw/def jkl mno uvw xyz/] },
    { name => 'Baz', funcs => [qw/abc uvw xyz/] },
);

my %allfuncs = ( map { map {$_=>undef} @{$$_{funcs}} } @data );
$$_{funcs} = { %allfuncs, map {$_=>1} @{$$_{funcs}} } for @data;

use Data::Dump;
dd @data;

# just for output:
use List::Util qw/max/;
my $maxlen = max map {length} map({$$_{name}} @data), keys(%allfuncs);
my $fmt = join('  ', ("%${maxlen}s") x @data)."\n";
printf $fmt, map { $$_{name} } @data;
for my $f (sort keys %allfuncs) {
    printf $fmt, map { $$_{funcs}{$f}?$f:'' } @data;
}

Output:

(
  {
    funcs => { abc => 1, def => 1, ghi => 1, jkl => undef, mno => undef, uvw => undef, xyz => 1 },
    name  => "Foo",
  },
  {
    funcs => { abc => undef, def => 1, ghi => undef, jkl => 1, mno => 1, uvw => 1, xyz => 1 },
    name  => "Bar",
  },
  {
    funcs => { abc => 1, def => undef, ghi => undef, jkl => undef, mno => undef, uvw => 1, xyz => 1 },
    name  => "Baz",
  },
)
Foo  Bar  Baz
abc       abc
def  def     
ghi          
     jkl     
     mno     
     uvw  uvw
xyz  xyz  xyz

Update: If your input data comes in the form of a AoA, this takes @table and produces the same @data as above (it basically transposes the AoA and then produces the hash structure):

my @table = ( [qw/Foo Bar Baz/], [qw/abc def abc/], [qw/def jkl uvw/],
    [qw/ghi mno xyz/], [qw/xyz uvw/], [undef, qw/xyz/] );
my @data;
for my $col ( 0 .. $table[0]->$#* )
    { push @data, [ map {$_->[$col]//()} @table ] }
@data = map { {name=>shift @$_, funcs=>$_} } @data;

And if you need your output format to be an AoA as well:

my @out = ( [map {$$_{name}} @data] );
for my $f (sort keys %allfuncs)
    { push @out, [ map {$$_{funcs}{$f}?$f:undef} @data ] }

Produces @out:

(
  ["Foo", "Bar", "Baz"],
  ["abc", undef, "abc"],
  ["def", "def", undef],
  ["ghi", undef, undef],
  [undef, "jkl", undef],
  [undef, "mno", undef],
  [undef, "uvw", "uvw"],
  ["xyz", "xyz", "xyz"],
)

Upvotes: 2

Related Questions