Reputation: 2250
I am trying getting the all anchor tags from a given HTML where the content length is more then 30 chars i.e. if i have this HTML with me
<td><a hreh="anything">Content is more then 30 chars........</a>
<a hreh="anything">another link</a>
</td>
I have write this RegEx for this preg_match_all("/<a href=\"(.*)\"[^>]*>([a-zA-Z0-9]{30,999})<\\/[a-zA-Z]+>/si",
$match[0],$posts);
where 30 is putting the limit of minimum 30 char to anchor tag content but unfortunately this is not working.
Anyone out there who can point out what i have made wrong.
Thanks
Note : I am trying fetching this page URL's This Link
Upvotes: 2
Views: 861
Reputation: 19407
Would something simple as
<a.*?>.{30,}?</a>
not suffice? The above looks for anchor tags, with their content being 30 characters or more. It does not attempt to validate the href attribute or any other attributes of the link. It can be altered if these are required.
This is translated into preg_match_all
as (thanks to @php_nub_qq)
preg_match_all("#<a.*?>.{30,}?</a>#", $match[0],$posts);
The URL you have linked contains letters, numbers, and non-alphanumeric characters in the url string. As you have little control over the source, it might be best to generalise the case like above rather than attempt to white list on a per character basis.
Upvotes: 2
Reputation: 780798
Try this:
preg_match_all("/<a href=\"(.*)\"[^>]*>([a-z\d\s]{30,})<\\/[a-z]+>/si", $match[0],$posts);
Since you have the i
case-insensitive modifier, you don't need both a-z
and A-Z
in your classes. And if you're just setting a minimum length of the content, you don't need to specify a maximum of 999
; {30,}
means 30 or more.
Upvotes: 0