maggiemh
maggiemh

Reputation: 25

Extracting data form website using php

I have the following website: http://stationmeteo.meteorologic.net/metar/your-metar.php?icao=LFRS&day=070308

I want to extract data from it. I tried using file_get_contents and some regular expressions, but something is not working.

this is the code I tried:

$content=file_get_contents('http://stationmeteo.meteorologic.net/metar/your-metar.php? icao=LFMN&day=010513');

preg_match('/00\:30 07\/03\/2008(.+)01\:30 07\/03\/2008/',$content,$m);
echo $m[0];
echo $m[1];

It's giving me undefined offset 0 and 1. If I copy the content of the web page directly to $content instead of using file_get_contents, it works fine.

What am I missing?

Upvotes: 2

Views: 127

Answers (1)

Tim Pietzcker
Tim Pietzcker

Reputation: 336108

The problem is that .+ matches any characters except newlines, and there is a newline character in the text you're trying to match.

Try

preg_match('~00:30 07/03/2008(.+)01:30 07/03/2008~s',$content,$m);

(using ~ as a delimiter so you don't have to escape all those slashes, by the way)

The next question is: Why don't I get this problem when copying the contents of the webpage directly into $content? Well, all whitespace is normalized to a single space when a webpage is rendered, turning the \n that's present in the page's source code (press Ctrl-U to see it) into a simple space. And .+ matches that space.

Upvotes: 2

Related Questions