Reputation: 25
I have the following website: http://stationmeteo.meteorologic.net/metar/your-metar.php?icao=LFRS&day=070308
I want to extract data from it. I tried using file_get_contents and some regular expressions, but something is not working.
this is the code I tried:
$content=file_get_contents('http://stationmeteo.meteorologic.net/metar/your-metar.php? icao=LFMN&day=010513');
preg_match('/00\:30 07\/03\/2008(.+)01\:30 07\/03\/2008/',$content,$m);
echo $m[0];
echo $m[1];
It's giving me undefined offset 0 and 1. If I copy the content of the web page directly to $content instead of using file_get_contents, it works fine.
What am I missing?
Upvotes: 2
Views: 127
Reputation: 336108
The problem is that .+
matches any characters except newlines, and there is a newline character in the text you're trying to match.
Try
preg_match('~00:30 07/03/2008(.+)01:30 07/03/2008~s',$content,$m);
(using ~
as a delimiter so you don't have to escape all those slashes, by the way)
The next question is: Why don't I get this problem when copying the contents of the webpage directly into $content
? Well, all whitespace is normalized to a single space when a webpage is rendered, turning the \n
that's present in the page's source code (press Ctrl-U to see it) into a simple space. And .+
matches that space.
Upvotes: 2