Reputation: 1212
I have a file with rows of integers. I want to remove rows where the value in the second column does not appear in any other row's second column.
ex:
4 5 6 // row 0
1 2 3 // row 1
3 3 3 // row 2
4 5 4 // row 3
3 4 5 // row 4
2 2 4 // row 5
Column 2 has values 3 and 4 that only occur once out of all the rows. I want to remove rows that have these unique column 2 values, rows 2 and 4.
I need these sorted by column 2 also so what I currently have is
$ sort -k2 -n filename > filename_sorted
But I'm not sure how to identify the rows whose column 2 value is unique and remove them.
Upvotes: 1
Views: 110
Reputation: 785186
You can use this sort + awk
:
sort -k2 -n file | awk 'p == $2{if (r != "") print r; print; r=""} p != $2{p=$2; r=$0}'
1 2 3 // row 1
2 2 4 // row 5
4 5 4 // row 3
4 5 6 // row 0
Upvotes: 0
Reputation: 289725
Just process the file twice: first to count the amount of times every 2nd column value appears and secondly to print those who appear more than once:
$ awk 'FNR==NR{a[$2]++; next} a[$2]>1' file file
4 5 6 // row 0
1 2 3 // row 1
4 5 4 // row 3
2 2 4 // row 5
Upvotes: 3
Reputation: 4841
This should do the filtering on any awk
:
awk '
{
seen[$2]++
line[$2]=$0
}
END {
for (i in seen)
if (seen[i]==1)
print line[i]
}' file
For filtering, if you happen to use gawk 4x
, you can do this in one go by adding the following:
BEGIN {PROCINFO["sorted_in"] = "@val_num_asc"}
Otherwise, the easiest is to pipe it to sort
:
... | sort -nk2
Upvotes: 0