Reputation: 916
I have a string with movie titles and release year. I want to be able to detect the Title (Year) pattern and if matched wrap it in anchor tags.
Wrapping it is easy. But is it possilbe to write a regex to match this pattern if I don't know what the name of the movie would be?
Example:
$str = 'A random string with movie titles in it.
Movies like The Thing (1984) and other titles like Captain America Civil War (2016).
The movies could be anywhere in this string.
And some movies like 28 Days Later (2002) could start with a number.';
So the pattern will always be Title
(starting with uppercase letter) and will end with (Year)
.
This is what I have got so far:
if(preg_match('/^\p{Lu}[\w%+\/-]+\([0-9]+\)/', $str)){
error_log('MATCH');
}
else{
error_log('NO MATCH');
}
This currently does not work. From what I understand this is what it should do:
^\p{Lu} //match a word beginning with an uppercase letter
[\w%+\/-] //with any number of characters following it
+\([0-9]+\) //ending with an integer
Where am I going wrong with this?
Upvotes: 1
Views: 391
Reputation: 4523
The following regex should do it :
(?-i)(?<=[a-z]\s)[A-Z\d].*?\(\d+\)
Explanation
(?-i)
case-sensitive(?<=[a-z]\s)
look-behind for any lower-case letter and space [A-Z\d]
match an upper-case letter or digit.*?
match any character\(\d+\)
match any digits including parenthesisPHP
<?php
$regex = '/(?-i)(?<=[a-z]\s)[A-Z\d].*?\(\d+\)/';
$str = 'A random string with movie titles in it.
Movies like The Thing (1984) and other titles like Captain America Civil War (2016).
The movies could be anywhere in this string.
And some movies like 28 Days Later (2002) could start with a number.';
preg_match_all($regex, $str, $matches);
print_r($matches);
?>
Upvotes: 2
Reputation: 91430
This regex does the job:
~(?:[A-Z][a-zA-Z]+\s+|\d+\s+)+\(\d+\)~
Explanation:
~ : regex delimiter
(?: : start non capture group
[A-Z] : 1 capital letter, (use \p{Lu} if you want to match title in any language)
[a-zA-Z]+ : 1 or more letter, if you want to match title in any language(use \p{L})
\s+ : 1 or more spaces
| : OR
\d+ : 1 or more digits
\s+ : 1 or more spaces
)+ : end group, repeated 1 or more times
\(\d+\) : 1 or more digits surrounded by parenthesis, (use \d{4} if the year is always 4 digits)
~ : regex delimiter
Implementation:
$str = 'A random string with movie titles in it.
Movies like The Thing (1984) and other titles like Captain America Civil War (2016).
The movies could be anywhere in this string.
And some movies like 28 Days Later (2002) could start with a number.';
if (preg_match_all('~(?:[A-Z][a-zA-Z]+\s+|\d+\s+)+\(\d+\)~', $str, $match)) {
print_r($match);
error_log('MATCH');
}
else{
error_log('NO MATCH');
}
Result:
Array
(
[0] => Array
(
[0] => The Thing (1984)
[1] => Captain America Civil War (2016)
[2] => 28 Days Later (2002)
)
)
MATCH
Upvotes: 0