Reputation: 116
I have this exaple list
Veep - Season 1 BDMux.torrent
Vegas S01e01-21.torrent
Velvet S01e13.torrent
Velvet.e10.torrent
Velvet_e01.torrent
Veronica Mars s01.torrent
Vicious S01e01-06.torrent
Victor Ros S01e01-06.torrent
Video.Game.High.School.S01e01-09.XviD.torrent
Vikings - Season 1 EXT.torrent
Vikings_S04e04.avi.torrent
I want eliminate similar lines like velvet. or velvet_ and consolidate to one and finally print like this
Veep
Vegas
Velvet
Veronica Mars
Victor Ros
Video Game High School
Vikings
How regex?
Upvotes: 0
Views: 100
Reputation: 8332
To do all that in one regex, I'd say is impossible. However, this regex
^(.*?)[ ._-]*(?:s\w*\s*\d+)?(?:e\d\d(?:-\d\d)?)?[\s.]*\w*?\.torrent(?:[\s\S]*\1.*$)*$
handles what you throwed at us ;). There's one but though - it can't remove the dots in titles like Video.Game.High.School
.
And - it requires the shows to be grouped, like in your example (e.g. All Velvet
grouped together). This ought to be easily solved by Notepad++
's Edit>Line Operations>Sort Lines in Ascending
though.
Check it out here at regex101.
What it does is to capture everything up to season and/or episode, allowing for an optional format and finally matching .torrent
. It then optionally matches everything up to a possible repeat of the first captured and whatever follows up to the end of the line. The last step is repeated until no match found. The capture group now holds the name of the show, but the regex matches all lines of the show. Thus, replacing the whole match with the capture, will leave only one clean entry for each show.
This means that it won't handle when a shows name starts with the complete name of another show, e.g. American Crime
and American Crime Story
, since the first would match the second, and therefor keep matching 'til the end of the second. This can be fixed by including the test for season/episode in the second part of the regex, but I opted out on this to keep it simpler and faster.
So, you say in a comment "regex does not need to be perfect". Well, here's one that gets most of the job done for you - but isn't perfect.
Regards
Edit
Made some updates and simplified regex considerably. Here's the old one if you want the more specific one:
^(.*?)[ ._]?(?:-? season \d+|(?:s\d\d)?(?:e\d\d(?:-\d\d)?)?)[\s.]*(?:bdmux|xvid|ext|avi)?\.torrent(?:[\s\S]*\1.*$)*$
Upvotes: 0