Reputation: 33380
I am parsing an HTML file that contains data that is associated in a grid like manner and am close to being done. I had previously thought that removing all blank lines would be needed but I failed to notice that some fields in the grid are blank. I am now trying to use the Tie::File module to store the file in an array, iterate over it, and if there are three continuous blank lines, I want to insert a dummy value I can manipulate later so that the blank line stripping does not alter the structure of my data.
What I have tried so far (the file is ~2 MB):
my @lines;
my $num = 0;
tie @lines, 'Tie::File', 'results.txt';
(tied @lines)->defer;
foreach (@lines)
{
chomp $lines[$num];
$num++;
if ($lines[$num-1] =~ /^$/ && $lines[$num+1] =~ /^$/)
{
$lines[$num] = "null";
}
}
(tied @lines)->flush;
untie @lines;
Edit: How do I go about iterating over the array and insert the value so there is only one space between each line so I can later get rid of all the blank lines?
Upvotes: 1
Views: 631
Reputation: 6204
If I understand your problem correctly (replace three consecutive empty lines with the word "null" and an empty line on either side), perhaps the following regex operating on your file's contents will help:
use Modern::Perl;
my $htmlFile = do { local $/; <DATA> };
$htmlFile =~ s/(?<!\S)\n{3}/\nnull\n\n/g;
say $htmlFile;
__DATA__
A
B
C
D
E
F
Output:
null
A
B
null
null
C
D
null
E
F
Upvotes: 1