Reputation: 64
I'm not very good at regex and looked everywhere i could. I could use some help to parse this page (http://www.imdb.com/search/title?count=100&groups=oscar_best_picture_winners&sort=year,desc&ref_=nv_ch_osc_3) to get the movies name . P.S: Could use a dummy regex too.
Upvotes: 0
Views: 1779
Reputation: 6148
This is almost the same problem as your previous question and the answer is the same... Albeit with a modified regex.
#<td class="number">(\d+).</td>.*?<a href="/title/tt\d+/">(.*?)</a>#s
https://stackoverflow.com/a/19600974/2573622
For more information you might want to check out the following link:
http://www.regular-expressions.info/
Click on Tutorial on the top menu bar and there are explanations about just about everything regex.
Firstly, you have to get the relevant html (for one movie) from the page...
<td class="number">RANK.</td>
<td class="image">
<a href="/title/tt000000/" title="FILM TITLE (YEAR)"><img src="http://imdb.com/path-to-image.jpg" height="74" width="54" alt="FILM TITLE (YEAR)" title="FILM TITLE (YEAR)"></a>
</td>
<td class="title">
<span class="wlb_wrapper" data-tconst="tt000000" data-size="small" data-caller-name="search"></span>
<a href="/title/tt000000/">FILM TITLE</a>
You then strip out the noise/changeable info...
<td class="number">RANK.</td>.*?<a href="/title/tt\d+/">FILM TITLE</a>
Then add your capture groups...
<td class="number">(RANK).</td>.*?<a href="/title/tt\d+/">(FILM TITLE)</a>
and that's it:
#<td class="number">(\d+).</td>.*?<a href="/title/tt\d+/">(.*?)</a>#s
The s
modifier after the ending pattern delimiter makes the regex engine make .
match new lines as well
Same as in previous answer (with modified regex)
$page = file_get_contents('http://www.imdb.com/search/title?count=100&groups=oscar_best_picture_winners&sort=year,desc&ref_=nv_ch_osc_3');
preg_match_all('#<td class="number">(\d+).</td>.*?<a href="/title/tt\d+/">(.*?)</a>#s', $page, $matches);
$filmList = array_combine($matches[1], $matches[2]);
Then you can do:
echo $filmList[1];
/**
Output:
Argo
*/
echo array_search("The Artist", $filmList);
/**
Output:
2
*/
http://php.net/manual/en/reference.pcre.pattern.modifiers.php
http://php.net/file_get_contents
http://php.net/preg_match_all
http://php.net/array_combine
http://php.net/array_search
Upvotes: 3
Reputation:
Not sure which backslashes you do/don't need:
href=\"\/title\/tt.*height=\"74\" width=\"54\" alt=\"([^"]*)\"
useful result is \1
or $1
Upvotes: 0