Reputation: 5822

Remove duplicate lines based on a partial line comparison

I have a text file that contains thousands of lines of text as below.

123 hello world
124 foo bar
125 hello world

I would like to test for duplicates by checking a sub-section of the line. For the above it should output:

123 hello world
124 foo bar

Is there a vim command that can do this?

Update: I am on a windows machine so can't use uniq

Upvotes: 6

Answers (4)

carlo.polisini

Reputation: 316

In VIM I was able to sort and remove duplicates with the following command

:sort u

Upvotes: -1

kev

Reputation: 161834

This is a bash command:

sort -k2 input | uniq -s4

sort -k2 will skip the 1st field when sorting
uniq -s4 will skip the leading 4 characters

In vim, you can call external command above:

:%!sort -k2 % | uniq -s4

the 2nd % will expand to current file name.

Actually, you can sort in vim with this command:

:sort /^\d*\s/

vim will skip the matched numbers when sorting

After sorting, use this command to remove duplicated lines:

:%s/\v(^\d*\s(.*)$\n)(^\d*\s\2$\n)+/\1/

To avoid too many backslash escaping, I use \v in the pattern to turn on VERY MAGIC.
In a multi-line pattern, $ will match position right before newline(\n). I don't think it's necessary here, though.
You can craft your own regex.

Upvotes: 9

Guru

Reputation: 17004

Using awk:

$ awk '!a[$2$3]++' file
123 hello world
124 foo bar

First element when enters the array sets the count as 1, and hence the further occurrences does not enter into the array since the negation makes it false.

Upvotes: 1

timwoj

Reputation: 388

I'm not sure about in vim, but you could do something with the uniq command. It has a --skip-fields argument that can be used to skip the first part of each line.

$ cat test.txt
123 hello world
124 foo bar
125 hello world

$ cat test.txt | sort -k 2 | uniq --skip-fields=1 | sort
123 hello world
124 foo bar

Upvotes: 0

Remove duplicate lines based on a partial line comparison

Answers (4)

Related Questions