Tastybrownies
Tastybrownies

Reputation: 937

Removing duplicate rows from Notepad++

I am looking for a way to remove duplicate rows from my Notepad++ file. The rows are not exact duplicates per say. Here's the situation. I have a large file of capitalized company names with probability values as well (each separated by a tab). So the format would be like this:

ATT   .7213
SAMSUNG   .01294
SAMSUNG   .90222

So, I need to remove one of these rows because there is a match in the first column. I don't really have a preference of which one I need to remove just as long as I end up with one row at the end. I have tried to use unique sorting with TextFX but it's looking for the whole row duplicate and not just the first column. If anyone could offer up a handy solution to fix this I would greatly appreciate it. Bash script answers using awk, sed, or cut are also acceptable as well as using regular expressions.

Thank you!

Upvotes: 0

Views: 1302

Answers (2)

chepner
chepner

Reputation: 532303

Use sort:

sort -k1,1 -u companies.txt

The output will consist of the full line, but only the sorting key (the first field) will be considered for identifying duplicates.

Upvotes: 1

devnull
devnull

Reputation: 123648

Using awk, you could say:

awk '!a[$1]++' filename

This would keep only the lines having a unique value for the first field.

Upvotes: 3

Related Questions