Treps
Treps

Reputation: 800

Can't get preg_match() to work to fetch content from another website

I'm trying to fetch a value from an external website with a RegEx for the tag and preg_match() but it's not working.

My code

$file = file_get_contents('http://www.investing.com/indices/us-spx-500');

$regexp = '/\<span class\=\"arial_26 inlineblock pid-166-last\" id\=\"last_last\" dir\=\"ltr\"\>(.*?)\<\/span>/';
preg_match($regexp, $file, $string1);

print_r(array_values($string1));

The tag I need to match is:

<span class="arial_26 inlineblock pid-166-last" id="last_last" dir="ltr">1,880.02</span>

1,880.02 = (.*?)

I need to fetch the value of indice S&P500. I know it might be a copyright issue. This is just for private use. As you can see in $regexp I need to escape all special characters which is done. I have tried to fetch a tag from a TXT file and it's working, so I know the code is correct/linked. Must be an issue with the RegEx. Can someone figure it out, or have I missed something? The array is empty.

I thought it was because of the white spaces in the class so I tried \s but it didn't worked.

I have also tried the following without progress:

$regexp = '#<span class="arial_26 inlineblock pid-166-last" id="last_last" dir="ltr">(.*?)</span>#';

If you check the source code from the website it should be that specific tag.

Thanks in advance.

Upvotes: 0

Views: 371

Answers (2)

Casimir et Hippolyte
Casimir et Hippolyte

Reputation: 89584

PHP has build-in tools to parse HTML, regex are not appropriate here in particular since you are looking for a node with an id attribute!

// you set the user_agent with the name you want
$opts = [ 'http' => [ 'user_agent' => 'obliglobalgu' ] ];
// to create a stream context 
$context = stream_context_create($opts);
// set the stream context for DOMDocument::loadHTMLFile 
libxml_set_streams_context($context); 

$url = 'http://www.investing.com/indices/us-spx-500';

libxml_use_internal_errors(true); // avoid eventual libxml errors to be displayed

$dom = new DOMDocument;
$dom->loadHTMLFile($url);

$spanNode = $dom->getElementById('last_last');

if ($spanNode)
    echo $spanNode->nodeValue;

libxml_clear_errors();

Upvotes: 2

Michail Strokin
Michail Strokin

Reputation: 541

It doesn't work because investing.com doesn't return anything if you don't pass an user agent to it. Following code works properly:

$options = array(
  'http'=>array(
    'method'=>"GET",
    'header'=>"Accept-language: en\r\n" .
              "User-Agent: Mozilla/5.0 (iPad; U; CPU OS 3_2 like Mac OS X; en-us) AppleWebKit/531.21.10 (KHTML, like Gecko) Version/4.0.4 Mobile/7B334b Safari/531.21.102011-10-16 20:23:10\r\n" // i.e. An iPad 
  )
);
$context = stream_context_create($options);
$file = file_get_contents('http://www.investing.com/indices/us-spx-500',false,$context);
$regexp = '/\<span class=\"arial_26 inlineblock pid-166-last\" id=\"last_last\" dir\=\"ltr\"\>(.*?)<\/span>/';
preg_match($regexp, $file, $string1);
print_r(array_values($string1));

Also, you only need to escape " and / in that string, no need to escape =, < and >

Upvotes: 1

Related Questions