Reputation: 840
Consider the following file:
5,*,ABC
6,5,XYZ
7,5,123
4,6,xyz
1,4,xox
8,6,yoy
The format of each line: (*
has no parent)
pid,parent-pid,name
I would like to somehow to create the following file:
ABC,
ABC,XYZ
ABC,123
ABC,XYZ,xyz
ABC,XYZ,xyz,xyx
ABC,XYZ,yoy
Meaning for every PID, I can go to its greatest parent on the same line. I thought to implement it (in Perl) with hash by inserting it into a hash. problem is I don't really know what would be the length of each line and then the length of the hash. Also, I'm looking for the most efficient way possible.
What good algorithm would solve this problem?
Upvotes: 0
Views: 67
Reputation: 16377
I would handle it by storing an array of the parent relationships and then traversing through that array each time a line was read:
my @parent;
open my $IN, '<', 'file' or die;
while (<$IN>) {
chomp;
my ($id, $parent, $name) = split /,/;
$parent[$id] = [ $parent, $name ];
if ($parent eq '*') {
print $name;
} else {
my @output = ( [ $parent, $name ] );
while (my $p = $parent[${$output[0]}[0]]) {
unshift @output, $p;
}
print join ',', map { ${$_}[1] } @output;
}
print "\n";
}
close $IN;
Output:
ABC
ABC,XYZ
ABC,123
ABC,XYZ,xyz
ABC,XYZ,xyz,xox
ABC,XYZ,yoy
-- EDIT -- Per feedback, revised to use hashes and not be dependent on file order:
my %parent;
open my $IN, '<', 'file' or die;
while (<$IN>) {
chomp;
my ($id, $parent, $name) = split /,/;
$parent{$id} = [ $parent, $name ];
}
seek $IN, 0, 0;
while (<$IN>) {
chomp;
my ($id, $parent, $name) = split /,/;
if ($parent eq '*') {
print $name;
} else {
my @output = ( [ $parent, $name ] );
while (my $p = $parent{${$output[0]}[0]}) {
unshift @output, $p;
}
print join ',', map { ${$_}[1] } @output;
}
print "\n";
}
close $IN;
Upvotes: 0
Reputation: 385789
You could build a hash of pids keyed by parent pid.
use feature qw( current_sub );
use Text::CSV_XS qw( );
my $csv = Text::CSV_XS->new({ binary => 1, auto_diag => 2 });
my %process_children_by_pid;
my %process_name_by_pid;
while (my $row = $csv->getline(*STDIN)) {
my ($pid, $parent, $name) = @$row;
$process_name_by_pid{$pid} = $name;
push @{ $processes_children_by_pid{$parent} }, $pid;
}
sub {
my $pid = pop;
push @_, $process_name_by_pid{$pid};
$csv->say(*STDOUT, \@_);
__SUB__->(@_, $_) for @{ $processes_children_by_pid{$pid} };
}->($_) for @{ $processes_children_by_pid{'*'} };
Or you could use Graph.pm. This has more overhead, but it makes error checking easy.
use feature qw( current_sub );
use Graph qw( );
use Text::CSV_XS qw( );
my $csv = Text::CSV_XS->new({ binary => 1, auto_diag => 2 });
my $tree = Graph->new();
my %process_name_by_pid;
while (my $row = $csv->getline(*STDIN)) {
my ($pid, $parent, $name) = @$row;
$process_name_by_pid{$pid} = $name;
$tree->add_edge($parent, $pid);
}
die "Bad data" if $tree->has_a_cycle;
my @roots = $tree->predecessorless_vertices();
die "Bad data" if @roots != 1 || $roots[0] ne '*';
sub {
my $pid = pop;
push @_, $process_name_by_pid{$pid};
$csv->say(*STDOUT, \@_);
__SUB__->(@_, $_) for $tree->successors($pid);
}->($_) for $tree->successors('*');
Upvotes: 2