Reputation: 81
Im trying to parse a very large list where some fields are blank thus the rows are not needed. What I want to do is identify where the row is blank in that certain field and omit it. Can anyone help?? I am new to perl and am not sure if i should use split, join, or either.
id name food drink
1 joe chips pop
2 jack chicken beer
3 josh pizza beer
4 jim beer
5 john cookies milk
This is an example table that is \t
delimited. Notice that #4 Jim is missing a food item. Because of that, I want to delete the entire row. I'm not even sure where to begin on this one. I was hoping an expert would have a solution for this scenario.
Upvotes: 0
Views: 480
Reputation: 107080
Okay, not doing a one liner...
The split takes a string and splits it, so each item is an element in the array I'm returning. The join
goes the other way. It allows me to join the elements of an array into a single string.
Note that I use /\s+/
for my split. This matches any continuous whitespace and will work with a single tab, or if a person entered two tabs to keep things looking nice, or if someone accidentally types a space or two before pressing the tab key.
You want to make sure all of your lines have four elements. Since we split the line into an array, we can test to make sure that the array for that line has four elements in it. If not, we can skip it.
To test the how many elements an array has, you simply use the array in a scalar context. My comparison next if @array < 4;
is doing just that. The next
will skip to the next iteration of my loop without going through the rest of the loop code. You'll commonly will see next if...
or next unless
type statements in Perl. It's a great way to skip over lines in a file or array that don't match your criteria.
The __DATA__
is a neat trick in Perl. All lines after __DATA__
are treated as a file. When I access <DATA>
, it's like if I read from a file.
use strict;
use warnings;
use autodie;
use feature qw(say);
for my $line ( <DATA> ) {
chomp $line; # Always "chomp" right after a read
my @array = split /\s+/, $line;
next if (@array < 4 );
printf "%-2.2s %-10.10s %-10.10s %-10.10s\n", @array;
}
__DATA__
id name food drink
1 joe chips pop
2 jack chicken beer
3 josh pizza beer
4 jim beer
5 john cookies milk
This will print out:
id name food drink
1 joe chips pop
2 jack chicken beer
3 josh pizza beer
5 john cookies milk
Upvotes: 2
Reputation: 126742
This is very straightforward using autosplit and a command-line program. Like this
perl -aF\t -ne "print if $F[3] =~ /\S/" milk.txt
output
1 joe chips pop
2 jack chicken beer
3 josh pizza beer
5 john cookies milk
I've assumed the numbers at the start of the line are part of the data
Upvotes: 1
Reputation: 50667
It will skip lines which have at least one empty value,
perl -F'\t' -wane 'print if !grep !length, @F' file
# or
# perl -F'\t' -wane 'print unless grep !length, @F' file
output
1 joe chips pop
2 jack chicken beer
3 josh pizza beer
5 john cookies milk
or just to check third column,
perl -F'\t' -wane 'print if length($F[2])' file
Upvotes: 1