eawedat
eawedat

Reputation: 417

Regex Get only docx or doc value from href

After searching in stackoverflow, I have found this regex pattern:

/href=['"]([^'"]+?)['"]/

it gets all hrefs' values.

Now I need to restrict that pattern to get only doc or docx values.

Note that link may end with additions after .docx or .doc.

For example, if I have the link:

<a href="/site/file1.doc?id=1">link1</a>

Result should be:

/site/file1.doc

Upvotes: 0

Views: 413

Answers (2)

ZzCalvinzZ
ZzCalvinzZ

Reputation: 151

/href=['"]([^'"]+?\.docx?)[^'"]['"]/

check it out here: https://regex101.com/r/oS1cD0/2

Upvotes: 0

Bohemian
Bohemian

Reputation: 424983

Try this:

/href=(['"])([^'"]+\.docx?(\?[^'"]*)?)\1/

This requires that what comes after ".doc" or ".docx" is either the end of the href, or a question mark followed by stuff, ie it won't match "foo.doctor".

This also ensures that the quotes match at each end via a back reference.

See live demo.

Upvotes: 2

Related Questions