Ravi Soni
Ravi Soni

Reputation: 2250

PHP regex conditional get content and link from HTML anchor tag

I am trying getting the all anchor tags from a given HTML where the content length is more then 30 chars i.e. if i have this HTML with me

<td><a hreh="anything">Content is more then 30 chars........</a>
<a hreh="anything">another link</a>
</td>

I have write this RegEx for this preg_match_all("/<a href=\"(.*)\"[^>]*>([a-zA-Z0-9]{30,999})<\\/[a-zA-Z]+>/si", $match[0],$posts);

where 30 is putting the limit of minimum 30 char to anchor tag content but unfortunately this is not working.

Anyone out there who can point out what i have made wrong.

Thanks

Note : I am trying fetching this page URL's This Link

Upvotes: 2

Views: 861

Answers (2)

Kami
Kami

Reputation: 19407

Would something simple as

<a.*?>.{30,}?</a>

not suffice? The above looks for anchor tags, with their content being 30 characters or more. It does not attempt to validate the href attribute or any other attributes of the link. It can be altered if these are required.

This is translated into preg_match_all as (thanks to @php_nub_qq)

preg_match_all("#<a.*?>.{30,}?</a>#", $match[0],$posts);

The URL you have linked contains letters, numbers, and non-alphanumeric characters in the url string. As you have little control over the source, it might be best to generalise the case like above rather than attempt to white list on a per character basis.

Upvotes: 2

Barmar
Barmar

Reputation: 780798

Try this:

preg_match_all("/<a href=\"(.*)\"[^>]*>([a-z\d\s]{30,})<\\/[a-z]+>/si", $match[0],$posts);

Since you have the i case-insensitive modifier, you don't need both a-z and A-Z in your classes. And if you're just setting a minimum length of the content, you don't need to specify a maximum of 999; {30,} means 30 or more.

Upvotes: 0

Related Questions