Reputation: 55
I want to create a script to automatically grab the text located within a specific class on a wikipedia page. For example, I want to get the musician Avicii's real name (Tim Bergling) from his wikipedia page. From google's inspect element I found that his name is stored in a class called "nickname":
<td class="nickname">Tim Bergling</td>
I would like to get the contents of the nickname class. I found a few threads that helped me out with some of the code, but I cannot get it to work correctly. Here is what I have so far:
<?php
$wiki= file_get_contents("http://en.wikipedia.org/wiki/Avicii");
preg_match("/\<td class\=\"nickname\"\>(.*?)\<\/td\>/",$wiki,$n);
print $n;
?>
Ultimately I want this name sent to a specific class on my website where it will be displayed. For now, I would just be content getting it to print. Thanks :)
Edit: I should clarify that I'm very new to PHP and coding in general, but I've picked it up quick and I'm trying to push myself. I appreciate your time very much!
Upvotes: 2
Views: 3978
Reputation: 165
You should be using a DOMDocument
class instead of preg_match
,try :
$html = file_get_contents("your url");
$DOM = new DOMDocument();
$DOM->loadHTML($html);
$finder = new DomXPath($DOM);
$classname = 'nickname';
$nodes = $finder->query("//*[contains(@class, '$classname')]");
foreach ($nodes as $node) {
echo $node->nodeValue;
}
Upvotes: 2