Shaun
Shaun

Reputation: 401

finding differences across a row with awk

I have a table in which most of the values in a given row are the same. What I want to pull out are any rows where at least one of the values is different. I’ve figured out how to do that with something like this

awk -F "\t" '{if (($4!=$5)&&($5!=$6)&&($6!=$7)) print $0;}'

The only problem is there are 40 some odd columns to compare. Is there a more elegant way to compare multiple columns for differences. BTW – these are non numerical values so a fancy math trick wont work.

Thanks All. I'm a newbee so I have to admit that I don't understand all of the commands, etc. but I can look it up from here. Not sure who's suggestion I'll go with but I learn more from concrete examples than I do from textbook explanations so having these different solutions is a big help with my learning curve.

Upvotes: 2

Views: 154

Answers (3)

Steve
Steve

Reputation: 54402

You could just use a for loop:

awk -F "\t" '{ for(i=4;i<NF;i++) if ($i != $(i+1)) { print; next } }' file

Adjust accordingly. HTH.

Upvotes: 0

Chris Seymour
Chris Seymour

Reputation: 85795

A fancy math trick might not work but how about:

$ cat file
one one one one two
two two two two two
three four four five

$ awk '{f=$0;gsub($1,"")}NF{print f}' file 
one one one one two
three four four five

First we store the line in original state f=$0 then we do a global substitution on everything matching the first field, if all fields are the same then nothing will be left therefor NF will be 0 and nothing will be printed else we print the original line.

Your script starts at $4 which suggests you are only interested in changes from this field on in which case:

$ awk '{f=$0;gsub($4,"")}NF>3{print f}' file 

Upvotes: 6

torek
torek

Reputation: 488193

If any field differs from some other field, then either it differs from field 1, or field 1 differs from some other field (by definition). So just loop from 2 to NF (number of fields) comparing it against all other fields:

awk -F "\t" '{ for (i = 2; i <= NF ;i++) if ($i != $1) { print; next; }}'

You can tune this to ignore leading fields (e.g., start at 5 and compare against $4) as needed.

Upvotes: 0

Related Questions