ThatCampbellKid
ThatCampbellKid

Reputation: 523

Regex: Select All Duplicate Lines

Ok, I've been playing with this for a while and have gotten close, but still cannot pull it off.

I want to go from:

a
a
b
a
c
a

to (In Notepad++):

b
c

I can do:

a
b
c

Here are my best fails so far, but you get the idea:

^(((.+)(\r?\n))(?:(?!\1).*\s*)?)((?:(?!\2).*\s*)?(\2))+
^((.+)(\r?\n))((?:(?!\1).*\s*)?(\1))+

From Regexr, I just want 'test line'. Regexr

New closest attempt:

^((.+)(\r?\n))(?=(.+)(\r?\n))?(\1)+

enter image description here

Upvotes: 1

Views: 150

Answers (2)

bobble bubble
bobble bubble

Reputation: 18535

For those who haven't read through the comments, the idea is to use NP++ for filtering out an IP blacklist by dropping it into the full IP list and completely remove duplicate lines that occure anywhere.

This can be done by use of a variable length lookbehind that's not supported in Notepad++.

As a workaround and also more efficient:

  1. Sort lines by use of Plugin TextFX Character (select all).
  2. Use a simple pattern like ^(.+)\R(?:\1(?:\R|$))+ to remove the consecutive duplicate lines.
  • ^ line start
  • (.+)\R capture one or more characters to \1 followed by an \R line break.
  • (?:\1(?:\R|$))+ followed by one or more ( \1, followed by (line break or $ end)).

Upvotes: 1

yoga
yoga

Reputation: 860

This task is not to be done with regex IMHO. This kind of task needs to be handled over to any programming languages. I am posting one of the gazillion solution, but based on shell:

sort file.txt | uniq -d

this command is going to sort through the file and display the list of duplicate values.

Upvotes: 0

Related Questions