Reputation: 937
I am looking for a way to remove duplicate rows from my Notepad++ file. The rows are not exact duplicates per say. Here's the situation. I have a large file of capitalized company names with probability values as well (each separated by a tab). So the format would be like this:
ATT .7213
SAMSUNG .01294
SAMSUNG .90222
So, I need to remove one of these rows because there is a match in the first column. I don't really have a preference of which one I need to remove just as long as I end up with one row at the end. I have tried to use unique sorting with TextFX but it's looking for the whole row duplicate and not just the first column. If anyone could offer up a handy solution to fix this I would greatly appreciate it. Bash script answers using awk, sed, or cut are also acceptable as well as using regular expressions.
Thank you!
Upvotes: 0
Views: 1302
Reputation: 532303
Use sort
:
sort -k1,1 -u companies.txt
The output will consist of the full line, but only the sorting key (the first field) will be considered for identifying duplicates.
Upvotes: 1
Reputation: 123648
Using awk
, you could say:
awk '!a[$1]++' filename
This would keep only the lines having a unique value for the first field.
Upvotes: 3