joshE
joshE

Reputation: 81

perl conditional split/join

Im trying to parse a very large list where some fields are blank thus the rows are not needed. What I want to do is identify where the row is blank in that certain field and omit it. Can anyone help?? I am new to perl and am not sure if i should use split, join, or either.

id  name    food    drink
1   joe chips   pop
2   jack    chicken beer
3   josh    pizza   beer
4   jim     beer
5   john    cookies milk

This is an example table that is \t delimited. Notice that #4 Jim is missing a food item. Because of that, I want to delete the entire row. I'm not even sure where to begin on this one. I was hoping an expert would have a solution for this scenario.

Upvotes: 0

Views: 480

Answers (3)

David W.
David W.

Reputation: 107080

Okay, not doing a one liner...

The split takes a string and splits it, so each item is an element in the array I'm returning. The join goes the other way. It allows me to join the elements of an array into a single string.

Note that I use /\s+/ for my split. This matches any continuous whitespace and will work with a single tab, or if a person entered two tabs to keep things looking nice, or if someone accidentally types a space or two before pressing the tab key.

You want to make sure all of your lines have four elements. Since we split the line into an array, we can test to make sure that the array for that line has four elements in it. If not, we can skip it.

To test the how many elements an array has, you simply use the array in a scalar context. My comparison next if @array < 4; is doing just that. The next will skip to the next iteration of my loop without going through the rest of the loop code. You'll commonly will see next if... or next unless type statements in Perl. It's a great way to skip over lines in a file or array that don't match your criteria.

The __DATA__ is a neat trick in Perl. All lines after __DATA__ are treated as a file. When I access <DATA>, it's like if I read from a file.

use strict;
use warnings;
use autodie;
use feature qw(say);

for my $line ( <DATA> ) {
    chomp $line;       # Always "chomp" right after a read
    my @array = split /\s+/, $line;
    next if (@array < 4 );
    printf "%-2.2s  %-10.10s  %-10.10s  %-10.10s\n", @array;
}

__DATA__
id  name    food    drink
1   joe chips   pop
2   jack    chicken beer
3   josh    pizza   beer
4   jim     beer
5   john    cookies milk

This will print out:

id  name        food             drink
1   joe         chips            pop
2   jack        chicken          beer
3   josh        pizza            beer
5   john        cookies          milk

Upvotes: 2

Borodin
Borodin

Reputation: 126742

This is very straightforward using autosplit and a command-line program. Like this

perl -aF\t -ne "print if $F[3] =~ /\S/" milk.txt

output

1       joe     chips   pop
2       jack    chicken beer
3       josh    pizza   beer
5       john    cookies milk

I've assumed the numbers at the start of the line are part of the data

Upvotes: 1

mpapec
mpapec

Reputation: 50667

It will skip lines which have at least one empty value,

perl -F'\t' -wane 'print if !grep !length, @F' file
# or
# perl -F'\t' -wane 'print unless grep !length, @F' file

output

1       joe     chips   pop
2       jack    chicken beer
3       josh    pizza   beer
5       john    cookies milk

or just to check third column,

perl -F'\t' -wane 'print if length($F[2])' file

Upvotes: 1

Related Questions