user1352084
user1352084

Reputation: 469

filtering a file by columns

I have a unix question. I have a file that looks like this:

AAAA    0   1   2   2   0

BBBBB   2   2   2   2   2

CCCCC   1   1   0   1   1

DDDD    0   0   0   0   0

EEEEE   2   2   0   2   2

The file has many thousands of rows like this (and is also tab-delimitated). The first column of the file is a name and the 2nd thru 6th columns are the data. It is the info in the 2nd-6th column that is important. I need to output all of the lines in which the 2nd-6th columns have no more than 1 0(zero). For example, I would want the output to look like this:

BBBBB   2   2   2   2   2

CCCCC   1   1   0   1   1

EEEEE   2   2   0   2   2

I have been trying to do this in as simple a method as possible and have tried the following awk command:

awk 'BEGIN{out!=0;}{if($2!=0)out++;if($3!=0)out++;if($4!=0)out++;if($5!=0)out++;if($6!=0)out++;if (out>=4)print;}'

But, when I try this, it just gives me the original input file. I am not sure what is wrong, or if I am even taking a correct approach. Any help would be appreciated.

Upvotes: 2

Views: 1515

Answers (5)

Vijay
Vijay

Reputation: 67301

much simpler way to do this is :

awk '{count=0;for(i=2;i<=NF;i++){if($i~/0/)++count;}if(count <=1)print}' file1

tested below:

> cat file1
AAAA    0       1       2       2       0
BBBBB   2       2       2       2       2
CCCCC   1       1       0       1       1
DDDD    0       0       0       0       0
EEEEE   2       2       0       2       2
sEEEE   2       0       0       0       2
> awk '{count=0;for(i=2;i<=NF;i++){if($i~/0/)++count;}if(count <=1)print}' file
BBBBB  2 2 2 2 2
CCCCC  1 1 0 1 1
EEEEE  2 2 0 2 2
> 

Upvotes: 0

Karl Nordstr&#246;m
Karl Nordstr&#246;m

Reputation: 327

Assuming that the columns conforms to a particular format can be dangerous. Below is an easy solution using the 0,1 properties of Boolean variables:

awk '($2==0) + ($3==0) + ($4==0) + ($5==0) + ($6==0) <2' file.txt

Upvotes: 0

Steve
Steve

Reputation: 54572

One way using perl:

perl -ne 'print if(tr/0/0/ <= 1)' file.txt

I'm assuming the names on each line don't contain numbers (specifically 0) and that they do not exceed one digit in length. Also, if you add the -i flag, you can make the changes in-file.

Upvotes: 0

glenn jackman
glenn jackman

Reputation: 247192

awk '
  {
    nzero=0
    for (fld = 2; nzero <= 1 && fld <= 6; fld++) {
      if ($fld == 0) nzero++
    }
    if (nzero <= 1) print
  }
' filename

Upvotes: 0

bsravanin
bsravanin

Reputation: 1873

The mistake you are doing is not resetting the out variable for each record, and instead initializing it just once in the BEGIN block. (You are also mistakenly initializing that using a "not equals".)

awk '{out = 0; if($2!=0) out++; if($3!=0) out++; if($4!=0) out++; if($5!=0) out++; if($6!=0) out++; if(out>=4) print}'

Upvotes: 2

Related Questions