Reputation: 25
I'm trying to extract a value from a multiline pattern with PHP and preg_match. The pattern I'm searching for within the string I'm passing to preg_match($regex, $string, $the_match):
Latitude:</td>
<td class="formCell">
40-45-40.205 N
</tr>
I know that if it were all on one line like so:
Latitude:</td><td class="formCell">40-45-40.205 N</tr>
Then the following would be valid and it would properly extract the value:
/Latitude:<\/td><td class="formCell">(.*?)<\/tr>/
However, since the pattern I'm looking for has multiple lines the above regex doesn't work. I'm getting the initial string I'm passing to preg_match() via file_get_contents($url) so I'm at the mercy of the remote content to some extent. Any help would be much appreciated!
Upvotes: 1
Views: 1939
Reputation: 34576
Use [\s\S]
instead of .
.
/Latitude:<\/td>[\s]*<td class="formCell">([\s\S]*?)<\/tr>/
.
is a wildcard but does not include whitespace - including line break - characters. [\s\S]
simply says "match all space and non-space characters" (i.e. anything at all).
Note I also allowed for optional space characters after </td>
.
(Sidenote: the HTML is invalid - closing a table row before closing the table cell.)
Upvotes: 6
Reputation: 18290
I think the trick is to "sprinkle" [\s]*
anywhere the HTML formal would legally allow whitespace. You do not need special flags or anything.
Latitude:[\s]*<\/td>[\s]*<td[\s]*class="formCell">[\s]*([\s\S]*?)[\s]*<\/tr>
Keep in mind that html is VERY forgiving about whitespace. You need to evaluate your input and decide what is acceptable tolerance for you.
Another caveat is that these elements may have different attributes, or different quote styles... If you must work with that as well, you will need to use more of .
and then use the "unready" flag (add u
after the pattern when passing it to the preg functions); and then perhaps some fancy back-referencing once you realize that > can legally occur inside of an attribute ;-)
Upvotes: 0
Reputation: 2693
There is no simple flag for this. A simple hack could be:
Latitude:(.*?)<\/td>(.*?)<td class="formCell">(.*?)<\/tr>
And then add the dotall flag to your regex (s) to allow a '.'[dot] to match newlines also. But then it could match a lot more. Is it your own code or are you ripping html from a 3rd party website? Because maybe you are using regex' when you don't have to!
Upvotes: 0