Reputation: 2866
Problem One:
</a>
19-10-2011, 04:49 PM
</td> <td class="thread"
How to fetch the DATE and TIME i.e. 19-10-2011, 04:49 PM
Note: the above snippet could have unstable spacing as you see above e.g. </td> <td class
My attempt:
preg_match("#</a>(.*?)</td> <td class=\"thread\"#", $page, $fetchContent);
Result: empty
Problem Two:
<div id="post_message_43345">ANY TYPE OF CONTENT INCLUDING SPACES</tr> <tr>
I need to fetch "ANY TYPE OF CONTENT".
Note: the spacing between tags such as </tr> <tr>
could vary from page to another.
My attempt:
preg_match("#<div id=\"post_message_[a-zA-Z0-9_]*\">(.*?)</tr> <tr>#", $page, $fetchedContent);
Result: empty
I'm looking for rough temporary short snippet for one task. Therefore, i didn't use HTML parser.
Any help will be appreciated.
Upvotes: 0
Views: 116
Reputation: 56905
You need to use the s
flag to have .
match newline characters too:
preg_match("#</a>(.*?)</td> <td class=\"thread\"#s", $page, $fetchContent);
You'd probably be better off matching the date directly though:
preg_match("#([0123]?[0-9]-(?:0?[1-9]|1[012])-(?:[0-9]{4})),? ?((?:0[0-9]|1[012]):[0-5][0-9] ?[AP]M)#",...)
edit - this date regex will be a little faster (added boundaries either side):
preg_match("#\\b([0123]?[0-9]-(?:0?[1-9]|1[012])-(?:[0-9]{4}))[, ]{1,3}((?:0[0-9]|1[012]):[0-5][0-9] ?[AP]M)\\b#",...)
For both, the date is in $results[1]
and the time is in $results[2]
.
Again the s
flag, and to have varying spaces between the </tr> <tr>
use *
.
preg_match("#<div id=\"post_message_[a-zA-Z0-9_]*\">(.*?)</tr> *<tr>#s", $page, $fetchedContent);
If you want to allow for newlines between the </tr>
and <tr>
then do \s*
instead. Same for Problem 1.
Upvotes: 1
Reputation: 145482
Note: the above snippet could have unstable spacing as you see above
You want it to match newlines also. The .
doesn't do that normally. This would require the #s
modifier basically:
preg_match('#</a>(.*?)</td> <td class="thread"#s', ...
But you could also just add \s*
twice around your (.*?)
capture group. Also between the </td>
and <td
.
And then you could make your regex more specific \d\d-\d\d-\d\d, \d\d:\d\d
to only capture the date. That might make matching the tags somewhat redundant.
Note: the spacing between tags such as could vary from page to another.
You can again just use \s*
which matches spaces and newlines in any combination.
Upvotes: 1