Reputation: 11
everyone
I am looking for a way to keep the records from txt file that meet the following condition:
This is the example of the data:
aa bb cc
11 22 33
44 55 66
77 88 99
aa bb cc
11 22 33 44 55 66 77
44 55 66 66
77 88 99
aa bb cc
11 22 33 44 55
44 55 66
77 88 99 77
...
Basically, it's a file where one record where there are total 5 lines, 4 lines contain strings/numbers with tab delimeter , and the last is the new line \n.
The first line of the record always has 3 elements, while the number of elements in 2nd 3rd and 4th line can be different.
What I need to do is to remove every record(5 lines block) where total number of elements in the second line > 3 ( and I don't care about the number of elements in all the rest lines) . The output of the example should look like this:
aa bb cc
11 22 33
44 55 66
77 88 99
...
so only the record where i have 3 elements are kept and recorded in the new txt file.
I tried to do it with awk by modifying FS and RS values like this:
awk 'BEGIN {RS="\n\n"; FS="\n";}
{if(length($2)==3) print $2"\n\n"; }' test_filter.txt
but if(length($2)==3)
is not correct, as I should count the number of entries in 2nd field instead of counting the length, which I can't find how to do.. any help would be much appreaciated!
thanks in advance,
Upvotes: 1
Views: 69
Reputation: 34244
You can use the split()
function to break a line/field/string into components; in this case:
n=split($2,arr," ")
Where:
" "
) as the delimiter ...arr[]
and ...n
is the number of elements in the arrayPulling this into OP's current awk
code, along with a couple small changes, we get:
awk 'BEGIN {ORS=RS="\n\n"; FS="\n"} {n=split($2,arr," "); if (n>=4) next}1' test_filter.txt
With an additional block added to our sample:
$ cat test_filter.txt
aa bb cc
11 22 33
44 55 66
77 88 99
aa bb cc
11 22 33 44 55 66 77
44 55 66 66
77 88 99
aa bb cc
111 222 333
444 555 665
777 888 999
aa bb cc
11 22 33 44 55
44 55 66
77 88 99 77
This awk
solution generates:
aa bb cc
11 22 33
44 55 66
77 88 99
aa bb cc
111 222 333
444 555 665
777 888 999
# blank line here
Upvotes: 2