Removing duplicate rows from Notepad++

Question

I am looking for a way to remove duplicate rows from my Notepad++ file. The rows are not exact duplicates per say. Here's the situation. I have a large file of capitalized company names with probability values as well (each separated by a tab). So the format would be like this:

ATT   .7213
SAMSUNG   .01294
SAMSUNG   .90222

So, I need to remove one of these rows because there is a match in the first column. I don't really have a preference of which one I need to remove just as long as I end up with one row at the end. I have tried to use unique sorting with TextFX but it's looking for the whole row duplicate and not just the first column. If anyone could offer up a handy solution to fix this I would greatly appreciate it. Bash script answers using awk, sed, or cut are also acceptable as well as using regular expressions.

Thank you!

chepner · Accepted Answer

Use sort:

sort -k1,1 -u companies.txt

The output will consist of the full line, but only the sorting key (the first field) will be considered for identifying duplicates.

Removing duplicate rows from Notepad++

Answers (2)

Related Questions