Super Sonic
Super Sonic

Reputation: 116

Regex - remove similar strings

I have these lines

http://dawn-ofthe-dead.blogspot.com/2008/02/amenra-hitch.html
http://dawn-ofthe-dead.blogspot.com/2008/0...enra-hitch.html
https://yadi.sk/mail/?hash=R041opeqcsTT3kmODt3qXcIAmcxOx1P78E1PqDOqJR8%3D
https://yadi.sk/mail/?hash=R041opeqcsTT3kmO...78E1PqDOqJR8%3D
https://mail.yandex.ru/message_part/2011%20-%20Amenra%20%26%20Oathbreaker%20(Split).rar?name=2011%20-%20Amenra%20%26%20Oathbreaker%20(Split).rar&hid=1.3&ids=2440000004701735584
https://mail.yandex.ru/message_part/2011%20..000004701735584
http://mediaboom.org/mp3/127749-amenra-mass-i-prayer-i-vi-2003.html
http://mediaboom.org/mp3/127749-amenra-mas....-i-vi-2003.html

I want remove strings with

..
...
....

because are similar almost duplicate strings.

I want this output

http://dawn-ofthe-dead.blogspot.com/2008/02/amenra-hitch.html
https://yadi.sk/mail/?hash=R041opeqcsTT3kmODt3qXcIAmcxOx1P78E1PqDOqJR8%3D
https://mail.yandex.ru/message_part/2011%20-%20Amenra%20%26%20Oathbreaker%20(Split).rar?name=2011%20-%20Amenra%20%26%20Oathbreaker%20(Split).rar&hid=1.3&ids=2440000004701735584
http://mediaboom.org/mp3/127749-amenra-mass-i-prayer-i-vi-2003.html

How regex? (I'm using Notepad++)

Upvotes: 0

Views: 140

Answers (2)

Toto
Toto

Reputation: 91518

To remove all lines that have 2 or more dots, I'll do:

  • Ctrl+H
  • Find what: ^.*\.\.+.*\R?
  • Replace with: NOTHING
  • Replace all

Upvotes: 4

Sebastian Proske
Sebastian Proske

Reputation: 8413

Under the condition, that the line you want to remove always follows the line without the dots, you can use the following (make sure regular expression is checked in Notepad++ replace dialog):

Search pattern: ^(.{25,})(.*)$\R\1.*

Replace pattern: $1$2

This is checking for 25 characters in one line, that are repeated in the next line - and removes this second line. Of course you can replace 25 by whatever number you feel appropriate to avoid false positives.

Upvotes: 2

Related Questions