Reputation: 2093

In Vim, how to remove all lines that are duplicate somewhere

I have a file that contains lines as follows:

one one
one one
two two two
one one
three three
one one
three three
four

I want to remove all occurrences of the duplicate lines from the file and leave only the non-duplicate lines. So, in the example above, the result should be:

two two two
four

I saw this answer to a similar looking question. I tried to modify the ex one-liner as given below:

:syn clear Repeat | g/^\(.*\)\n\ze\%(.*\n\)*\1$/exe 'syn match Repeat "^' . escape(getline ('.'), '".\^$*[]') . '$"' | d

But it does not remove all occurrences of the duplicate lines, it removes only some occurrences.

How can I do this in vim? or specifically How can I do this with ex in vim?

To clarify, I am not looking for sort u.

Upvotes: 6

Answers (9)

jianyu

Reputation: 1

Add line number so that you can restore the order before sort :%s/^/=printf("%d ", line("."))/g
sort :sort /^\d+/
Remove duplicate lines :%s/^(\d+ )(.*)\n(\d+ \2\n)+//g
Restore order :sort
Remove line number added in #1 :%s/^\d+ //g

Upvotes: 0

cn8341

Reputation: 129

please use perl ,perl can do it easily !

use strict;use warnings;use diagnostics;
#read input file
open(File1,'<input.txt') or die "can not open file:$!\n";my @data1=<File1>;close(File1);
#save row and count number of row in hash 
my %rownum;
foreach my $line1 (@data1)
{ 
    if (exists($rownum{$line1}))
    { 
        $rownum{$line1}++;
    }
    else
    {
        $rownum{$line1}=1;
    }
}
#if number of row in hash =1 print it
open(File2,'>output.txt') or die "can not open file:$!\n";
foreach my $line1 (@data1)
{ 
    if($rownum{$line1}==1)
    { 
        print File2 $line1;
    }
}
close(File2);

Upvotes: -1

Ingo Karkat

Reputation: 172648

My PatternsOnText plugin version 1.30 now has a

:DeleteAllDuplicateLinesIgnoring

command. Without any arguments, it'll work as outlined in your question.

Upvotes: 1

benjifisher

Reputation: 5122

It does not preserve the order of the remaining lines, but this seems to work:

:sort|%s/^\(.*\)\n\%(\1\n\)\+//

(This version is @Peter Rincker's idea, with a little correction from me.) On vim 7.3, the following even shorter version works:

:sort | %s/^\(.*\n\)\1\+//

Unfortunately, due to differences between the regular-expression engines, this no longer works in vim 7.4 (including patches 1-52).

Upvotes: 1

benjifisher

Reputation: 5122

This is not any simpler than @Ingo Karkat's answer, but it is a little more flexible. Like that answer, this leaves the remaining lines in the original order.

function! RepeatedLines(...)
  let first = a:0 ? a:1 : 1
  let last = (a:0 > 1) ? a:2 : line('$')
  let lines = []
  for line in range(first, last - 1)
    if index(lines, line) != -1
      continue
    endif
    let newlines = []
    let text = escape(getline(line), '\')
    execute 'silent' (line + 1) ',' last
      \ 'g/\V' . text . '/call add(newlines, line("."))'
    if !empty(newlines)
      call add(lines, line)
      call extend(lines, newlines)
    endif
  endfor
  return sort(lines)
endfun
:for x in reverse(RepeatedLines()) | execute x 'd' | endfor

A few notes:

My function accepts arguments instead of handling a range. It defaults to the entire buffer.
This illustrates some of the functions for manipulating lists. :help list-functions
I use /\V (very no magic) so the only character I need to escape in a search pattern is the backslash itself. :help /\V

Upvotes: 0

Don Cruickshank

Reputation: 5948

If you have access to UNIX-style commands, you could do:

:%!sort | uniq -u

The -u option to the uniq command performs the task you require. From the uniq command's help text:

   -u, --unique
          only print unique lines

I should note however that this answer assumes that you don't mind that the output doesn't match any sort order that your input file might have already.

Upvotes: 5

Ingo Karkat

Reputation: 172648

Taking the code from here and modifying it to delete the lines instead of highlighting them, you'll get this:

function! DeleteDuplicateLines() range
  let lineCounts = {}
  let lineNum = a:firstline
  while lineNum <= a:lastline
    let lineText = getline(lineNum)
    if lineText != ""
        if has_key(lineCounts, lineText)
            execute lineNum . 'delete _'
            if lineCounts[lineText] > 0
              execute lineCounts[lineText] . 'delete _'
              let lineCounts[lineText] = 0
              let lineNum -= 1
            endif
        else
            let lineCounts[lineText] =  lineNum
            let lineNum += 1
        endif
    else
      let lineNum += 1
    endif
  endwhile
endfunction

command! -range=% DeleteDuplicateLines <line1>,<line2>call DeleteDuplicateLines()

Upvotes: 0

romainl

Reputation: 196751

Assuming you are on an UNIX derivative, the command below should do what you want:

:sort | %!uniq -u

uniq only works on sorted lines so we must sort them first with Vim's buit-in :sort command to save some typing (it works on the whole buffer by default so we don't need to pass it a range and it's a built-in command so we don't need the !).

Then we filter the whole buffer through uniq -u.

Upvotes: 2

Kent

Reputation: 195169

if you are on linux box with awk available, this line works for your needs:

:%!awk '{a[$0]++}END{for(x in a)if(a[x]==1)print x}'

Upvotes: 3

In Vim, how to remove all lines that are duplicate somewhere

Answers (9)

Related Questions