Regex: Select All Duplicate Lines

Question

Ok, I've been playing with this for a while and have gotten close, but still cannot pull it off.

I want to go from:

a
a
b
a
c
a

to (In Notepad++):

b
c

I can do:

a
b
c

Here are my best fails so far, but you get the idea:

^(((.+)(
?
))(?:(?!\1).*\s*)?)((?:(?!\2).*\s*)?(\2))+
^((.+)(
?
))((?:(?!\1).*\s*)?(\1))+

From Regexr, I just want 'test line'.

New closest attempt:

^((.+)(
?
))(?=(.+)(
?
))?(\1)+

bobble bubble · Accepted Answer

For those who haven't read through the comments, the idea is to use NP++ for filtering out an IP blacklist by dropping it into the full IP list and completely remove duplicate lines that occure anywhere.

This can be done by use of a variable length lookbehind that's not supported in Notepad++.

As a workaround and also more efficient:

Sort lines by use of Plugin TextFX Character (select all).
Use a simple pattern like ^(.+)\R(?:\1(?:\R|$))+ to remove the consecutive duplicate lines.

^ line start
(.+)\R capture one or more characters to \1 followed by an \R line break.
(?:\1(?:\R|$))+ followed by one or more ( \1, followed by (line break or $ end)).

Regex: Select All Duplicate Lines

Answers (2)

Related Questions