Reputation: 116
I have a list like this in Notepad++
V - Visitors 2009 - S01e11-12.torrent
V - Visitors (2009) S02e04.torrent
V - Visitors (2009) S01e01-12.torrent
V S02e02.torrent
V S02e05.torrent
Valentina S01e01-13.torrent
Valeria Medico Legale S01-02e01-16.torrent
Veep - Season 1 BDMux.torrent
Veep - Season 2 BDMux.torrent
Veep - Season 3.torrent
Veep - Season 4.torrent
Vegas S01e01-21.torrent
Velvet S01e13.torrent
Velvet S01e15.torrent
Vikings.S03E03.torrent
Vikings.S03E04.torrent
Vikings.S03E05.torrent
Velvet_S03e02.torrent
Velvet_S03e03.torrent
Velvet_S03e04.torrent
I want a regex to delete repeated first-second words lines (veep - veep) to have a final list like this
V - Visitors 2009 - S01e11-12.torrent
V S02e02.torrent
Valentina S01e01-13.torrent
Valeria Medico Legale S01-02e01-16.torrent
Veep - Season 1 BDMux.torrent
Vegas S01e01-21.torrent
Velvet S01e13.torrent
So if I have
Veep - Season 1 BDMux.torrent
Veep - Season 2 BDMux.torrent
I want only first line
Veep - Season 1 BDMux.torrent
Upvotes: 1
Views: 177
Reputation: 10149
Do a regular expression find/replace like this:
^([^ _.-]+[ _.-]+([^ _.-]++)?)(.*?\R)(\1.*?\R)+
\1\3
Explanation
^([^ _.-]+[ _.-]+([^ _.-]++)?)
deals with getting the first word on a line followed by the separator " ", "_", "." or "-".
([^ _.-]++)?
) is optional to accomodate for the velvet
example\1
and what follows up to and including the linebreak is cptured into \3
for later reuse(.*?\R)
captures everything up to the linebreak (\R
(\1.*?\R)+
matches all following lines that begin with whatever is captured in \1
\1\3
and that only reconstructs the first line, thus deleting the following lineUpvotes: 1