Reputation: 27249
I have the following code:
#!/usr/bin/perl
# splits.pl
use strict;
use warnings;
use diagnostics;
my $pivotfile = "myPath/Internal_Splits_Pivot.txt";
open PIVOTFILE, $pivotfile or die $!;
while (<PIVOTFILE>) { # loop through each line in file
next if ($. == 1); # skip first line (contains business segment code)
next if ($. == 2); # skip second line (contains transaction amount text)
my @fields = split('\t',$_); # split fields for line into an array
print scalar(grep $_, @fields), "\n";
}
Given that the data in the text file is this:
4 G I M N U X
Transaction Amount Transaction Amount Transaction Amount Transaction Amount Transaction Amount Transaction Amount Transaction Amount
0000-13-I21 600
0001-8V-034BLA 2,172 2,172
0001-8V-191GYG 13,125 4,375
0001-9W-GH5B2A -2,967.09 2,967.09 25.00
I would expect the output from the perl script to be: 2 3 3 4
given the amount of defined elements in each line. The file is a tab delimited text file with 8 columns.
Instead I get 3 4 3 4
and I have no idea why!
For background, I am using Counting array elements in Perl as the basis for my development, as I am trying to count the number of elements in the line to know if I need to skip that line or not.
Upvotes: 1
Views: 1432
Reputation: 8548
There are not only tabs, but there are spaces as well.
trying out with splitting by space works Look below
#!/usr/bin/perl
# splits.pl
use strict;
use warnings;
use diagnostics;
while (<DATA>) { # loop through each line in file
next if ($. == 1); # skip first line (contains business segment code)
next if ($. == 2); # skip second line (contains transaction amount text)
my @fields = split(" ",$_); # split fields by SPACE
print scalar(@fields), "\n";
}
__DATA__
4 G I M N U X
Transaction Amount Transaction Amount Transaction Amount Transaction Amount Transaction Amount Transaction Amount Transaction Amount
0000-13-I21 600
0001-8V-034BLA 2,172 2,172
0001-8V-191GYG 13,125 4,375
0001-9W-GH5B2A -2,967.09 2,967.09 25.00
Output
2
3
3
4
Upvotes: 2
Reputation: 27249
A lot of great help on this question, and quickly too!
After a long, drawn-out learning process, this is what I came up with that worked quite well, with intended results.
#!/usr/bin/perl
# splits.pl
use strict;
use warnings;
use diagnostics;
my $pivotfile = "myPath/Internal_Splits_Pivot.txt";
open PIVOTFILE, $pivotfile or die $!;
while (<PIVOTFILE>) { # loop through each line in file
next if ($. == 1); # skip first line (contains business segment code)
next if ($. == 2); # skip second line (contains transaction amount text)
chomp $_; # clean line of trailing \n and white space
my @fields = split(/\t/,$_); # split fields for line into an array
print scalar(grep $_, @fields), "\n";
}
Upvotes: 0
Reputation: 2668
As a side note:
For background, I am using Counting array elements in Perl as the basis for my development, as I am trying to count the number of elements in the line to know if I need to skip that line or not.
Now I understand why you use grep
to count array elements. That's important when your array contains undefined values like here:
my @a;
$a[1] = 42; # @a contains the list (undef, 42)
say scalar @a; # 2
or when you manually deleted entries:
my @a = split /,/ => 'foo,bar'; # @a contains the list ('foo', 'bar')
delete $a[0]; # @a contains the list (undef, 'bar')
say scalar @a; # 2
But in many cases, especially when you're using arrays to just store list without operating on single array elements, scalar @a
works perfectly fine.
my @a = (1 .. 17, 1 .. 25); # (1, 2, ..., 17, 1, 2, .., 25)
say scalar @a; # 42
It's important to understand, what grep
does! In your case
print scalar(grep $_, @fields), "\n";
grep
returns the list of true values of @fields
and then you print how many you have. But sometimes this isn't what you want/expect:
my @things = (17, 42, 'foo', '', 0); # even '' and 0 are things
say scalar grep $_ => @things # 3!
Because the empty string and the number 0 are false values in Perl, they won't get counted with that idiom. So if you want to know how long an array is, just use
say scalar @array; # number of array entries
If you want to count true values, use this
say scalar grep $_ => @array; # number of true values
But if you want to count defined values, use this
say scalar grep defined($_) => @array; # number of defined values
I'm pretty sure you already know this from the other answers on the linked page. In hashes, the situation is a little bit more complex because setting something to undef
is not the same as delete
ing it:
my %h = (a => 0, b => 42, c => 17, d => 666);
$h{c} = undef; # still there, but undefined
delete $h{d}; # BAM! $h{d} is gone!
What happens when we try to count values?
say scalar grep $_ => values %h; # 1
because 42 is the only true value in %h
.
say scalar grep defined $_ => values %h; # 2
because 0 is defined although it's false.
say scalar grep exists $h{$_} => qw(a b c d); # 3
because undefined values can exist. Conclusion:
know what you're doing instead of copy'n'pasting code snippets :)
Upvotes: 2
Reputation: 2668
The problem should be in this line:
my @fields = split('\t',$_); # split fields for line into an array
The tab character doesn't get interpolated. And your file doesn't seem to be tab-only separated, at least here on SO. I changed the split regex to match arbitrary whitespace, ran the code on my machine and got the "right" result:
my @fields = split(/\s+/,$_); # split fields for line into an array
Result:
2
3
3
4
Upvotes: 2
Reputation: 47762
Your code works for me. The problem may be that the input file contains some "hidden" whitespace fields (eg. other whitespace than tabs). For instance
A<tab><space><CR>
gives two fields, A
and <space><CR>
A<tab>B<tab><CR>
gives three, A
, B
, <CR>
(remember, the end of line is part of the input!)I suggest you to chomp
every line you use; other than that, you will have to clean the array from whitespace-only fields. Eg.
scalar(grep /\S/, @fields)
should do it.
Upvotes: 1
Reputation: 98388
I suspect you have spaces mixed with the tabs in some places, and your grep test will consider " " true.
What does:
use Data::Dumper;
$Data::Dumper::Useqq=1;
print Dumper [<PIVOTFILE>];
show?
Upvotes: 2