TTaJTa4
TTaJTa4

Reputation: 840

reorganizing a file in a special way [Perl]

Consider the following file:

5,*,ABC
6,5,XYZ
7,5,123
4,6,xyz
1,4,xox
8,6,yoy

The format of each line: (* has no parent)

pid,parent-pid,name

I would like to somehow to create the following file:

ABC,
ABC,XYZ
ABC,123
ABC,XYZ,xyz
ABC,XYZ,xyz,xyx
ABC,XYZ,yoy

Meaning for every PID, I can go to its greatest parent on the same line. I thought to implement it (in Perl) with hash by inserting it into a hash. problem is I don't really know what would be the length of each line and then the length of the hash. Also, I'm looking for the most efficient way possible.

What good algorithm would solve this problem?

Upvotes: 0

Views: 67

Answers (2)

Hambone
Hambone

Reputation: 16377

I would handle it by storing an array of the parent relationships and then traversing through that array each time a line was read:

my @parent;

open my $IN, '<', 'file' or die;
while (<$IN>) {
  chomp;
  my ($id, $parent, $name) = split /,/;
  $parent[$id] = [ $parent, $name ];

  if ($parent eq '*') {
    print $name;
  } else {
    my @output = ( [ $parent, $name ] );

    while (my $p = $parent[${$output[0]}[0]]) {
      unshift @output, $p;
    }

    print join ',', map { ${$_}[1] } @output;
  }

  print "\n";
}
close $IN;

Output:

ABC
ABC,XYZ
ABC,123
ABC,XYZ,xyz
ABC,XYZ,xyz,xox
ABC,XYZ,yoy

-- EDIT -- Per feedback, revised to use hashes and not be dependent on file order:

my %parent;

open my $IN, '<', 'file' or die;
while (<$IN>) {
  chomp;
  my ($id, $parent, $name) = split /,/;
  $parent{$id} = [ $parent, $name ];
}

seek $IN, 0, 0;
while (<$IN>) {
  chomp;
  my ($id, $parent, $name) = split /,/;

  if ($parent eq '*') {
    print $name;
  } else {
    my @output = ( [ $parent, $name ] );

    while (my $p = $parent{${$output[0]}[0]}) {
      unshift @output, $p;
    }

    print join ',', map { ${$_}[1] } @output;
  }

  print "\n";
}
close $IN;

Upvotes: 0

ikegami
ikegami

Reputation: 385789

You could build a hash of pids keyed by parent pid.

use feature qw( current_sub );

use Text::CSV_XS qw( );

my $csv = Text::CSV_XS->new({ binary => 1, auto_diag => 2 });

my %process_children_by_pid;
my %process_name_by_pid;
while (my $row = $csv->getline(*STDIN)) {
   my ($pid, $parent, $name) = @$row;
   $process_name_by_pid{$pid} = $name;
   push @{ $processes_children_by_pid{$parent} }, $pid;
}

sub {
   my $pid = pop;
   push @_, $process_name_by_pid{$pid};
   $csv->say(*STDOUT, \@_);
   __SUB__->(@_, $_) for @{ $processes_children_by_pid{$pid} };
}->($_) for @{ $processes_children_by_pid{'*'} };

Or you could use Graph.pm. This has more overhead, but it makes error checking easy.

use feature qw( current_sub );

use Graph        qw( );
use Text::CSV_XS qw( );

my $csv = Text::CSV_XS->new({ binary => 1, auto_diag => 2 });

my $tree = Graph->new();
my %process_name_by_pid;
while (my $row = $csv->getline(*STDIN)) {
   my ($pid, $parent, $name) = @$row;
   $process_name_by_pid{$pid} = $name;
   $tree->add_edge($parent, $pid);
}

die "Bad data" if $tree->has_a_cycle;

my @roots = $tree->predecessorless_vertices();
die "Bad data" if @roots != 1 || $roots[0] ne '*';

sub {
   my $pid = pop;
   push @_, $process_name_by_pid{$pid};
   $csv->say(*STDOUT, \@_);
   __SUB__->(@_, $_) for $tree->successors($pid);
}->($_) for $tree->successors('*');

Upvotes: 2

Related Questions