user2362525
user2362525

Reputation:

Exclude a few words from a simple regex in PHP

I'm categorizing a few folders on my drives and I want to weed out low quality files using this regex (this works):

xvid|divx|480p|320p|DivX|XviD|DIVX|XVID|XViD|DiVX|DVDSCR|PDTV|pdtv|DVDRip|dvdrip|DVDRIP

Now some filenames are in High Definition but still have DVD or XviD in their filenames but also 1080p, 720p, 1080i or 720i. I need a single regex to match the one above but exclude these words 1080p, 720p, 1080i or 720i.

Upvotes: 1

Views: 1354

Answers (3)

Casimir et Hippolyte
Casimir et Hippolyte

Reputation: 89557

You can do it like this:

<pre><?php
$subjects = array('Arrival of the train at La Ciotat station.avi',
                  'Gardenator II - multi - DVDrip - 720i.mkv',
                  'The adventures of Roberto the bear - divx.avi',
                  'Tokyo’s Ginza District - dvdrip.mkv');

$pattern = '~(?(DEFINE)(?<excl>(?>d(?>vd(?>rip|scr)|ivx)|pdtv|xvid|320p|480p)))
             (?(DEFINE)(?<keep>(?>[^17]+?|1(?!080[ip])|7(?!20[ip]))))
             ^\g<keep>*\g<excl>\g<keep>*$  ~ix';

foreach($subjects as $subject) {
    if (preg_match($pattern, $subject)) echo $subject."\n"; }

The main interest is to avoid to test a lookahead on each character.

Upvotes: 0

stema
stema

Reputation: 92986

You can use a negative lookahead for this

^(?!.*(?:1080p|720p|1080i|720i)).*(?:xvid|divx|480p|320p|DivX|XviD|DIVX|XVID|XViD|DiVX|DVDSCR|PDTV|pdtv|DVDRip|dvdrip|DVDRIP)

This will match on your search strings, but fail if there is also 1080p|720p|1080i|720i in the string.

Upvotes: 1

HennyH
HennyH

Reputation: 7944

Use two regex's

one to find if it matches

1080p|720p|1080i|720i

Then if it doesn't, that is no match is found for the above, check for matches:

xvid|divx|480p|320p|DivX|XviD|DIVX|XVID|XViD|DiVX|DVDSCR|PDTV|pdtv|DVDRip|dvdrip|DVDRIP

Regular expressions don't support inverse matching, you could use negative look-arounds but for this task I wouldn't say they're appropriate. As you check for all the cases of 1080p-divx, you put a negative look ahead, however it doesn't catch divx-10bit-1080p, you couldn't achieve this in a simple regex.

Upvotes: 2

Related Questions