Reputation: 2649
Here, two rows are considered redundant if second value is same. Is there any unix/linux command that can achieve the following.
1 aa
2 aa
1 ss
3 dd
4 dd
Result
1 aa
1 ss
3 dd
I generally use the following command but it does not achieve what I want here.
sort -k2 /Users/fahim/Desktop/delnow2.csv | uniq
Edit:
My file had roughly 25 million lines: Time when using the solution suggested by @Steve : 33 seconds.
$date; awk -F '\t' '!a[$2]++' myfile.txt > outfile.txt; date
Wed Nov 27 18:00:16 EST 2013
Wed Nov 27 18:00:49 EST 2013
The sort and unique is taking too much time. I quit after waiting for 5 minutes.
Upvotes: 2
Views: 426
Reputation: 1286
I understand that you want a unique sorted file by the second field. You need to add -u to sort to achieve this.
sort -u -k2 /Users/fahim/Desktop/delnow2.csv
Upvotes: 1
Reputation: 54402
Perhaps this is what you're looking for:
awk -F "\t" '!a[$2]++' file
Results:
1 aa
1 ss
3 dd
Upvotes: 5