Wouter
Wouter

Reputation: 21

Regex to match only strings ending with a specific word and not containing other words

I tried to make a regex to match a string ending with the word 'Remixes' but only when it is not preceded by certain words and characters. I came up with the following regex with different results but both doesn't match it perfectly:

^(\w+)((?!\&|\+|And|The|Of|Various|House|Unreleased|Selected).)\s(Remixes)$

This excludes all the keywords in the string but not when it contains multiple words like: Think Twice Remixes or when it has one preceding word like: Various Remixes

^(.*)((?!\&|\+|And|The|Of|Various|House|Unreleased|Selected).)\s(Remixes)$

This excludes the following testexample: Fill Me Up + Remixes but not other examples with the excluded keywords, like Sides & Remixes

How can i make the first string match string with multiple preceding words and not match it if the exclude word is the only and first preceding word?

Upvotes: 2

Views: 620

Answers (1)

Sobrique
Sobrique

Reputation: 53508

Honestly, I wouldn't. regex is a powerful tool, and you can do a lot of things with it, but your code becomes much simpler and clearer when you don't try to "single-regex" every problem.

For your example, I would be quite tempted to use perl's grep function, which lets you specify compound conditions:

 my @filtered = grep { m/Remixes$/ 
                     and not   
                        m/(And
                             |The
                             |Of
                             |Various
                             |House
                             |Unreleased
                             |Selected
                         )\s*.?\s+Remixes/xi } @list_of_things

E.g.:

#!/usr/bin/env perl
use strict;
use warnings;

#set up a list of words to exclude when prefixing "Remix"
#qw is perl's "quote words" and lets you specify whitespace delimited values. 
my @exclude_remix_prefix = qw ( And
    The
    Of
    Various
    House
    Unreleased
    Selected );

#turn that into a sub regex (qr 'compiles' a regex). 
my $exclude = join( "|", @exclude_remix_prefix );
$exclude = qr/($exclude)\s+Remixes/i;

#read from the <DATA> filehandle, 
#but you could use <> to read from STDIN/filenames like 'sed/grep' do. 
my @filtered = grep { m/Remixes$/i and not m/$exclude/i; } <DATA>;

print @filtered;

__DATA__
Fill Me Up + Remixes
Sides & Remixes
Something Selected remixes

Output:

Fill Me Up + Remixes
Sides & Remixes

(Give me some samples of what should/shouldn't be matched, and I will expand)

We're probably straying a bit from your original use case, but if you want to create a transform pattern:

#!/usr/bin/env perl
use strict;
use warnings;

use Data::Dumper;

my @exclude_remix_prefix = qw ( And
    The
    Of
    Various
    House
    Unreleased
    Selected );

my $exclude = join( "|", @exclude_remix_prefix );
$exclude = qr/($exclude)\s+Remixes/i;

my %transform = map { m/$exclude/ ? () :  m/(.*)/ =>  m/(.*)\s+Remixes/ ; } <DATA>;
print Dumper \%transform; 

__DATA__
Euterpeh Remixes
The Beauty And The Beast Remixes
Think Twice Remixes
Stop And Reset Remixes

This generates specifically a hash containing:

$VAR1 = {
          'The Beauty And The Beast Remixes' => 'The Beauty And The Beast',
          'Think Twice Remixes' => 'Think Twice',
          'Euterpeh Remixes' => 'Euterpeh',
          'Stop And Reset Remixes' => 'Stop And Reset'
        };

Which you could perhaps use to generate a sequence of rename operations?

Or if you just want to 'in place' some operation, then a for loop:

for ( <DATA> ) { 
    chomp; 
    next if m/$exclude/; 
    print "rename ", m/(.*)\s+Remixes/, " ", m/(.*)/,"\n";
}

(OK, I know 'rename' isn't quite what you want to do, but ...)

Upvotes: 1

Related Questions