Reputation: 216
I am newbie to Perl. I had a file which contain data in Tree format as shown bellow. I need to parse large data and generate .TSV file from that. Format of file is as
A
|
|--B
| |
| |--C
| |
| |---PQR
| |---XYZ
|--D
| |
| |---LMN
|---XYZ
The output that I need is in Tab Separated format.
Coloum1 Coloum2 Coloum3 Coloum4
A B C PQR
A B C XYZ
A D LMN
A XYZ
I have written a code which is not working for intermediate node. Here it is B node which does not have leaf node, leaf node that are attached to root node is not properly coming in output. I am reading Input file from command line.
#!/usr/bin/perl
use Data::Dumper;
open (MYFILE, "<", $ARGV[0]);
my $content = "";
while(<MYFILE>)
{
my $line = $_;
$content = $content.$line;
}
my ($root, @block) = split(/\|--(\w)/, $content);
$root =~ s/.*?(\w+).*/$1/is;
my %block = @block;
print "\nColoum1\tColumn2\tColumn3\tColumn4";
foreach my $key( keys %block)
{
my $content = $block{$key};
my (undef, @lines) = split(/\n/, $content);
foreach my $line (@lines)
{
if($line =~ /---(\w+)/is)
{
my $val = $1;
if(defined $val)
{
print "\n$root\t$key\t$val";
}
}
}
}
The output that I got from this is
Coloum1 Column2 Column3 Column4
A D LMN
A D XYZ
A C PQR
A C XYZ
Is something that I am missing in this code. Can you guide me to solve my problem.
Is there any CPAN library that can help me to handle such problem.
Upvotes: 2
Views: 863
Reputation: 241988
My attempt:
#!/usr/bin/perl
use warnings;
use strict;
use Test::More tests => 1;
my $input = 'A
|
|--B
| |
| |--C
| |
| |---PQR
| |---XYZ
|--D
| |
| |---LMN
|---XYZ
';
open my $IN, '<', \$input or die $!;
my @path;
my @output;
my $size = 0;
while (<$IN>) {
if (!/\|/) { # Root.
@path = [0, /(\S+)/];
} elsif (/\|(?=-)/g) { # Capture the position of the last |.
if ($path[-1][0] == pos) { # Sibling.
($path[-1][1]) = /-+(\S+)/;
} elsif ( $path[-1][0] < pos) { # Child.
push @path, [pos, /-+(\S+)/];
} else { # New branch.
pop @path until $path[-1][0] == pos;
$path[-1] = [pos, /-+(\S+)/];
}
if (/---/) {
push @output, [ map $_->[1], @path ];
$size = @path if @path > $size;
}
}
}
my $expected = 'Column1 Column2 Column3 Column4
A B C PQR
A B C XYZ
A D LMN
A XYZ
';
my $output = join "\t", map "Column$_", 1 .. $size;
for my $row (@output) {
$output .= "\n";
$output .= join "\t", @{$row}[0 .. $#{$row} - 1],
(q()) x ($size - @$row),
$row->[-1];
}
$output .= "\n";
is($output, $expected);
Upvotes: 4