Dmitry Kuzminov
Dmitry Kuzminov

Reputation: 6584

AWK select rows where all columns are equal

I have a file with tab-separated values where the number of columns is not known a priori. In other words the number of columns is consistent within a file but different files have different number of columns. The first column is a key, the other columns are some arbitrary values.

I need to filter out the rows where the values are not the same. For example, assuming that the number of columns is 4, I need to keep the first 2 rows and filter out the 3-rd:

1   A   A   A
2   B   B   B
3   C   D   C

I'm planning to use AWK for this purpose, but I don't know how to deal with the fact that the number of columns is unknown. The case of the known number of columns is simple, this is a solution for 4 columns:

$2 == $3 && $3 == $4 {print}

How can I generalize the solution for arbitrary number of columns?

Upvotes: 1

Views: 1053

Answers (5)

kvantour
kvantour

Reputation: 26471

Here is a generalisation of the problem:

Select all lines where a set of columns have the same value: c1 c2 c3 c4 ..., where ci can be any number:

Assume we want to select the columns: 2 3 4 11 15

awk 'BEGIN{n=split("2 3 4 11 15",a)}
     {for(i=2;i<=n;++i) if ($(a[i])!=$(a[1])) next}1' file

A bit more robust, in case a line might not contain all fields:

awk 'BEGIN{n=split("2 3 4 11 15",a)}
     {for(i=2;i<=n;++i) if (a[i] <= NF) if ($(a[i])!=$(a[1])) next}1' file

Upvotes: 0

oguz ismail
oguz ismail

Reputation: 50750

If you guarantee no field contains regex-active chars and the first field never match the second, and there is no blank line in the input:

awk '{tmp=$0;gsub($2,"")} NF==1{print tmp}' file

Note that this solution is designed for this specific case and less extendable than others.

Upvotes: 4

David C. Rankin
David C. Rankin

Reputation: 84531

Another slight twist on the approach. In your case you know you want to compare fields 2-4 so you can simply loop from i=3;i<=NF checking $i!=$(i-1) for equality, and if it fails, don't print, get the next record, e.g.

awk '{for(i=3;i<=NF;i++)if($i!=$(i-1))next}1'

Example Use/Output

With your data in file.txt:

$ awk '{for(i=3;i<=NF;i++)if($i!=$(i-1))next}1' file.txt
1   A   A   A
2   B   B   B

Upvotes: 3

jxc
jxc

Reputation: 13998

I'd use a counter t with initial value of 2 to add the number of times $i == $(i+1) where i iterates from 2 to NF-1. print the line only if t==NF is true:

awk -F'\t' '{t=2;for(i=2;i<NF;i++){t+=$i==$(i+1)}}t==NF' file.txt

Upvotes: 0

RavinderSingh13
RavinderSingh13

Reputation: 133428

Could you please try following. This will compare all columns from 2nd column to till last column and check if every element is equal or not. If they are all same it will print line.

awk '{for(i=3;i<=NF;i++){if($(i-1)==$i){count++}};if((NF-2)==count){print};count=""}' Input_file

OR(by hard coding $2 in code, since if $2=$3 AND $3=$4 it means $2=$3=$4 so intentionally taking $2 in comparison rather than having i-1 fetching its previous value.)

awk '{for(i=3;i<=NF;i++){if($2==$i){count++}};if((NF-2)==count){print};count=""}'  Input_file

Upvotes: 0

Related Questions