user2170712
user2170712

Reputation: 17

regex pattern for url with no ending slash and exclude certain text in url

I'm looking for preg_match_all pattern to find all URL on a page that don't have trailing slash.

For example: if I have

a href="/testing/abc/">end with slash

a href="/testing/test/mnl">no ending slash

The result would be #2. Solution is posted at find pattern for url with no ending slash

I have tried to modify the provided pattern to exclude urls that have 'images' or '.pdf' but no luck yet.

Thanks.

Upvotes: 0

Views: 3191

Answers (2)

Zack
Zack

Reputation: 2869

I found a way to exclude a link that has .pdf, by modifying the provided answer from the other question. Still looking at why it won't not match the images example though.

href=(['"])[^\s]+(?<![\/]|.pdf)\1

Link to a working test http://www.rubular.com/r/jmBVstpGZD

Upvotes: 1

sp00m
sp00m

Reputation: 48807

This one should suit your needs (demo):

href="(?:(?<!images).(?!(?:[.]pdf|/)"))*?"
  • (?:) = non-capturing groupe
  • (?<!images). = any char not preceded by images
  • .(?!(?:[.]pdf|/)") = any char not followed by .pdf" nor by /"
  • *? = match as short as possible

Upvotes: 2

Related Questions