Reputation: 313
I have a text file with delimiters as spaces at the start of the lines.
Lines with no initial spaces should go in the first column of the CSV file; those with two spaces should go in the second column of the CSV; and those with four spaces should go in the third column.
This is all working fine as required.
In lines starting with two spaces I want that only the date should go in the second column, discarding the other data of the line. The rest should all remain as it is.
I have denoted spaces at the start of the line as #
for clarity.
Text file:
Component1
##(111) Amar Sen <[email protected]> <No comment> 2013/04/01
####/Com/src/folder1/folder2/newfile.txt
##(1199) Prashant Singh <[email protected]> <No comment> 2013/04/24
####/Com/src/folder1/folder2/testfile24
####/Com/src/folder1/folder2/testfile25
####/Com/src/folder1/folder2/testfile26
##(1204) Anthony Li <[email protected]> <No comment> 2013/04/25
####/Com/src2
Component2(added)
Component3
Output format:
Component1,2013/04/01,/Com/src/folder1/folder2/newfile.txt
2013/04/24,/Com/src/folder1/folder2/testfile24
/Com/src/folder1/folder2/testfile25
/Com/src/folder1/folder2/testfile26
2013/04/25,/Com/src2
Component2(added)
Component3
Here's the code. Its working fine except for the change described above.
use strict;
use warnings;
my $previous_count = "-1"; #beginning, we will think, that no spaces.
my $current_count = "0"; #current default value
my $maximum_count = 3;
my $to_written = "";
my $delimiter_between_columns = ",";
my $newline_separator = ";";
my $file = 'C:\\textfile.txt';
open (my $fh, '<:encoding(UTF-8)', $file) or die "Could not open file '$file' $!";
while (my $row = <$fh>) {
# ok, read.
chomp($row);
# print "row is : $row\n";
if ($row =~ m/^(\s*)/) {
#print length($1);
$current_count = length($1) / 2; #take number of spaces divided by 2
$row =~ s/^\s+//;
if ($previous_count >= $current_count || $previous_count == $maximum_count) {
#output here
print "$to_written" . $newline_separator . "\n";
$previous_count = 0;
$to_written = "";
}
$previous_count = 0 if ($previous_count == -1);
$to_written .= $delimiter_between_columns x ($current_count - $previous_count) . "$row";
$previous_count = $current_count;
#print"\n";
}
}
print "$to_written" . $newline_separator . "\n";
Upvotes: 1
Views: 146
Reputation: 126722
You seem to have got yourself tied up in knots a little with your solution.
This program seems to do what you need. I have added some commas to your "output format" as your example has no placeholders for initial empty fields.
I have kept the hash characters for this purpose. Obviously it is trivial to change them for spaces, replacing s/^(#*)//
with s/^(\s*)//
.
use strict;
use warnings;
my @row;
while (<DATA>) {
chomp;
s/^(#*)//;
my $i = length($1) / 2;
if ($i == 1 and m<(\d{4}/\d{2}/\d{2})>) {
$row[$i] = $1;
}
else {
$row[$i] = $_;
}
if ($i == 2) {
print join(',', @row), ";\n";
@row = ('') x 3;
}
}
__DATA__
Component1
##(111) Amar Sen <[email protected]> <No comment> 2013/04/01
####/Com/src/folder1/folder2/newfile.txt
##(1199) Prashant Singh <[email protected]> <No comment> 2013/04/24
####/Com/src/folder1/folder2/testfile24
####/Com/src/folder1/folder2/testfile25
####/Com/src/folder1/folder2/testfile26
##(1204) Anthony Li <[email protected]> <No comment> 2013/04/25
####/Com/src2
output
Component1,2013/04/01,/Com/src/folder1/folder2/newfile.txt;
,2013/04/24,/Com/src/folder1/folder2/testfile24;
,,/Com/src/folder1/folder2/testfile25;
,,/Com/src/folder1/folder2/testfile26;
,2013/04/25,/Com/src2;
Update
It makes more sense to cascade values from columns one and two into subsequent rows where they are not supplied. If you remove the line @row = ('') x 3
from my program it will do just that, with this output
Component1,2013/04/01,/Com/src/folder1/folder2/newfile.txt;
Component1,2013/04/24,/Com/src/folder1/folder2/testfile24;
Component1,2013/04/24,/Com/src/folder1/folder2/testfile25;
Component1,2013/04/24,/Com/src/folder1/folder2/testfile26;
Component1,2013/04/25,/Com/src2;
Upvotes: 1