Reputation: 1378
I have a lage text file that I would like to filter by excluding lines that have a number of columns matching a certain character. I had previously removed lines where all columns from 2 onwards contained a 0 or a . like so:
awk '{
for (i=2; i<=NF; i++)
if ($i!~/^(\.|0)/) {
print
break
}
}'
but now I would like it so that I would print lines that had less than a specific number of columns with this value (".").
For example with data:
A B C D E
0 1 . 0 0
1 ./. 0 1 1
1 1 0 0 0
0 0 . . 0
. ./. . . .
and a match value of 2 I would expect the bottom two lines to be excluded so that the output would be:
A B C D E
0 1 . 0 0
1 ./. 0 1 1
1 1 0 0 0
Any ideas?
Upvotes: 1
Views: 247
Reputation: 1517
Perhaps this is alright.
awk '$0 !~/\. \./' file
A B C D E
0 1 . 0 0
1 ./. 0 1 1
1 1 0 0 0
Upvotes: -1
Reputation: 47119
With awk:
$ awk '{c=0;for(i=1;i<NF;i++) c += ($i == ".")}c<2' file
A B C D E
0 1 . 0 0
1 ./. 0 1 1
1 1 0 0 0
Basically it iterates each column and add one to the counter if the column equals a period (.
).
The c<2
part will only print the line if there is less than two columns with periods.
With sed one can use:
$ sed -r 'h;s/[^. ]+//g;s/\.\. *//g;/\. \./d;x' file
A B C D E
0 1 . 0 0
1 ./. 0 1 1
1 1 0 0 0
-r
enables extended regular expressions (-E
on *BSD).
Basically a copy of the pattern space is stored in the h
old buffer, then all but spaces and periods is removed.
Now if the pattern space contains two separate periods it can be deleted if not the pattern space can be ex
changed with the hold buffer.
Upvotes: 3
Reputation: 37424
$ awk '{delete a; for(i=1;i<=NF;i++) a[$i]++; if(a["."]>=2) next} 1' foo
A B C D E
0 1 . 0 0
1 ./. 0 1 1
1 1 0 0 0
It iterates all fields (for
), counts field values and if
2 or more .
in a record, restrains from printing (next
). If you want to count the periods only from field 3 onward, change the start value of i
in the for
: for(i=3; ...)
.
Upvotes: 2
Reputation: 207670
Similar to @spasic's answer, but easier (for me) to read!
perl -ane 'print if (grep { /^\.$/} @F) < 2' file
A B C D E
0 1 . 0 0
1 ./. 0 1 1
1 1 0 0 0
The -a
separates the space-separated fields into an array called @F
for me. I then grep in the array @F
looking for elements that consist of just a period - i.e. those that start with a period and end immediately after the period. That counts the lone periods in each line and I print the line if that number is less than 2.
Upvotes: 1
Reputation: 23677
$ cat ip.txt
A B C D E
0 1 . 0 0
1 ./. 0 1 1
1 1 0 0 0
0 0 . . 0
. ./. . . .
$ perl -ne '(@c)=/\.\/\.|\./g; print if $#c < 1' ip.txt
A B C D E
0 1 . 0 0
1 ./. 0 1 1
1 1 0 0 0
(@c)=/\.\/\.|\./g
array of ./.
or .
matches from current line$#c
indicates index of last element, i.e (size of array - 1)./.
or .
use $#c < 2
Upvotes: 1