Reputation: 6584
I have a file with tab-separated values where the number of columns is not known a priori. In other words the number of columns is consistent within a file but different files have different number of columns. The first column is a key, the other columns are some arbitrary values.
I need to filter out the rows where the values are not the same. For example, assuming that the number of columns is 4, I need to keep the first 2 rows and filter out the 3-rd:
1 A A A
2 B B B
3 C D C
I'm planning to use AWK for this purpose, but I don't know how to deal with the fact that the number of columns is unknown. The case of the known number of columns is simple, this is a solution for 4 columns:
$2 == $3 && $3 == $4 {print}
How can I generalize the solution for arbitrary number of columns?
Upvotes: 1
Views: 1053
Reputation: 26471
Here is a generalisation of the problem:
Select all lines where a set of columns have the same value: c1 c2 c3 c4 ..., where ci can be any number:
Assume we want to select the columns: 2 3 4 11 15
awk 'BEGIN{n=split("2 3 4 11 15",a)}
{for(i=2;i<=n;++i) if ($(a[i])!=$(a[1])) next}1' file
A bit more robust, in case a line might not contain all fields:
awk 'BEGIN{n=split("2 3 4 11 15",a)}
{for(i=2;i<=n;++i) if (a[i] <= NF) if ($(a[i])!=$(a[1])) next}1' file
Upvotes: 0
Reputation: 50750
If you guarantee no field contains regex-active chars and the first field never match the second, and there is no blank line in the input:
awk '{tmp=$0;gsub($2,"")} NF==1{print tmp}' file
Note that this solution is designed for this specific case and less extendable than others.
Upvotes: 4
Reputation: 84531
Another slight twist on the approach. In your case you know you want to compare fields 2-4
so you can simply loop from i=3;i<=NF
checking $i!=$(i-1)
for equality, and if it fails, don't print, get the next record, e.g.
awk '{for(i=3;i<=NF;i++)if($i!=$(i-1))next}1'
Example Use/Output
With your data in file.txt
:
$ awk '{for(i=3;i<=NF;i++)if($i!=$(i-1))next}1' file.txt
1 A A A
2 B B B
Upvotes: 3
Reputation: 13998
I'd use a counter t
with initial value of 2
to add the number of times $i == $(i+1)
where i iterates from 2 to NF-1. print the line only if t==NF
is true:
awk -F'\t' '{t=2;for(i=2;i<NF;i++){t+=$i==$(i+1)}}t==NF' file.txt
Upvotes: 0
Reputation: 133428
Could you please try following. This will compare all columns from 2nd column to till last column and check if every element is equal or not. If they are all same it will print line.
awk '{for(i=3;i<=NF;i++){if($(i-1)==$i){count++}};if((NF-2)==count){print};count=""}' Input_file
OR(by hard coding $2
in code, since if $2=$3
AND $3=$4
it means $2=$3=$4
so intentionally taking $2
in comparison rather than having i-1
fetching its previous value.)
awk '{for(i=3;i<=NF;i++){if($2==$i){count++}};if((NF-2)==count){print};count=""}' Input_file
Upvotes: 0