John Cogan
John Cogan

Reputation: 1060

PDF matched incorrectly for image only href/src urls

I am trying to get my regular expression to match any image url with certain optionals. In my set that matches image file extensions everything is fine until I put in the gif extension. When I do that the pdf urls get matched for some reason.

Could anyone shed light on this?

I am using this within PHP with preg_match_all function

Rules for matching

  1. Can be either src or href link
  2. Can be relative or absolute link
  3. Protocol can be http or https if given
  4. Select only the link if matched
  5. Case insensitive and global

Pattern (Take out gif and pdfs are skipped)

[src|href]="([(https|http):\/\/]?[^"]*.[jpg|png|jpeg|gif])"

Test strings

Should match <a href="http://blog.mysite.com/wp-content/uploads/2014/04/13061-someimage.jpg">
Should match <a href="/wp-content/uploads/2014/04/13061-someimage.jpg">
No match <a href="/wp-content/uploads/2014/04/13061-somedoc.pdf"></a>
No match <a href="/wp-content/uploads/2014/04/13061-somedoc.pdf"></a>
Should match <img href="http://blog.mysite.com/wp-content/uploads/2014/04/13061-someimage.jpg"/>
Should match <img href="/wp-content/uploads/2014/04/13061-someimage.gif"/>
Should match <img href="http://blog.mysite.com/wp-content/uploads/2014/04/13061-someimage.jpg" />
Should match <img href="/wp-content/uploads/2014/04/13061-someimage.jpg" />

www.regex101.com fiddle: https://regex101.com/r/x3vVSx/1

Upvotes: 0

Views: 45

Answers (1)

John Cogan
John Cogan

Reputation: 1060

Thanks to @Micha Wiedenmann for this.

Quote/Unquote

You mixed up [ and (, you want (jpg|png|jpeg|gif) instead of [jpg|png|...]. Similarly for [src|href].

Upvotes: 0

Related Questions