Reputation: 5822
I have a text file that contains thousands of lines of text as below.
123 hello world
124 foo bar
125 hello world
I would like to test for duplicates by checking a sub-section of the line. For the above it should output:
123 hello world
124 foo bar
Is there a vim command that can do this?
Update: I am on a windows machine so can't use uniq
Upvotes: 6
Views: 4099
Reputation: 316
In VIM I was able to sort and remove duplicates with the following command
:sort u
Upvotes: -1
Reputation: 161834
This is a bash command:
sort -k2 input | uniq -s4
sort -k2
will skip the 1st field when sortinguniq -s4
will skip the leading 4 charactersIn vim, you can call external command above:
:%!sort -k2 % | uniq -s4
%
will expand to current file name.Actually, you can sort in vim with this command:
:sort /^\d*\s/
After sorting, use this command to remove duplicated lines:
:%s/\v(^\d*\s(.*)$\n)(^\d*\s\2$\n)+/\1/
\v
in the pattern to turn on VERY MAGIC.$
will match position right before newline(\n
). I don't think it's necessary here, though.Upvotes: 9
Reputation: 17004
Using awk:
$ awk '!a[$2$3]++' file
123 hello world
124 foo bar
First element when enters the array sets the count as 1, and hence the further occurrences does not enter into the array since the negation makes it false.
Upvotes: 1
Reputation: 388
I'm not sure about in vim, but you could do something with the uniq command. It has a --skip-fields argument that can be used to skip the first part of each line.
$ cat test.txt
123 hello world
124 foo bar
125 hello world
$ cat test.txt | sort -k 2 | uniq --skip-fields=1 | sort
123 hello world
124 foo bar
Upvotes: 0