Reputation: 3692
I have a space-seperated file like this:
First Second Third Forth
It is possible to
do this task
with regex but i
don't know how to
My task is to capture all the words of each line and construct a hash from them.
But here is my problem: Fields may be empty in any column (e.g. 3rd line, 3rd field).
Words in each line are aligned by the column's name at their beginning or end. (column's names are the words in the first line , e.g. First Second Third Forth
)
In my example, words are aligned to left (or to beginning of column name) in First Third Forth
columns and are aligned to right (or to end of column name) in Second
Using the hash from each line I have to create output formatted like this:
$hash{First} has Second-property $hash{Second}. It also has $hash{Third} and $hash{Forth}.
use File::Basename;
use locale;
open my $file, "<", $ARGV[0];
open my $file2,">>",fileparse($ARGV[0])."2.txt";
my @alls = <$file>;
sub Main{
my $first = shift @alls;
my $poses = First_And_Last($first);
my $curr_poses;
my $curr_hash;
#do{OutputLine($_->[0],$_->[1],$first)}for (@$poses);
my $result_array=[];
my @keys = qw(# Variable Type Len Format Informat Label);
for $word(@alls){
$curr_poses=First_And_Last($word);
undef ($curr_hash);
$curr_hash = Take_Words($poses, $word, $curr_poses);
push @{$result_array},$curr_hash; #AoH
}
#end of main
}
sub First_And_Last{
#First_And_Last($str)
my $str = shift;
my $begin;
my $end;
my $ref=[];
while ($str=~m/(([\S\.]\s?)+\b|#)/g){
$begin = pos($str) - length($1);
$end = pos($str);
push @{$ref},[$begin,$end];
}
return $ref;
}
sub Take_Words{
#Take_Words($poses, $line,$current)
my $outref = {};
my $ref = shift; #take the ref of offsets of words
my $line = shift;# and the next line in file
my $current = shift; # and this is the poses of current line
my @keys = qw(# Variable Type Len Format Informat Label);
do{$outref->{$_}=undef;}for(@keys);
my $ethalon; #for $ref
my $relativity; #for $current
my $key; #for key in $outref
my @ethalon = @{$ref};
$ethalon = shift @ethalon;
$relativity = shift @{$current};
$key = shift @keys;
while (defined($key) && defined($relativity)){
if ($ethalon->[0] == $relativity->[0] || $ethalon->[1] == $relativity->[1]){
$outref->{$key} = substr($line, $relativity->[0],$relativity->[1] - $relativity->[0]);
$relativity = shift @{$current};
}
$ethalon = shift @ethalon;
$key = shift @keys;
}
return $outref;
}
Upvotes: 1
Views: 171
Reputation: 57646
Here is my algorithm, but it is somewhat C-ish:
Determine the starting position of each column heading and store it.
For each column: Go to the headings starting position.
step left until you have passed two consecutive spaces.
go right two characters, then remember the position.
go right until you have passed two consecutive spaces.
go left two characters, then remeber the position.
Extract everything between the found boundaries.
remove starting and trailing white spaces.
Store in your hash
repeat from step 2
Now we'll have to see about that implementation:
Step 1:
my @starting;
{
my @char = split m{}, <$file>; # split the first line into char array
my $spacecount = 0;
my $state = 1; # 1 : find start -- 0 : find end
for (my $i = 0; $i < @char; $i++) {
if ($state) { # find next non-space
if ($char[$i] =~ /\s/) {
next;
} else {
$state = not $state; # flip
$spacecount = 0;
push @starting, $i;
next;
}
} else {
if ($char[$i] =~ /\s/) {
$spacecount++;
if ($spacecount >= 2) {
$state = not $state; # flip
next;
}
} else {
$spacecount = 0; # reset consecutive space counter
next;
}
}
}
}
Upvotes: 2