Reputation: 417
After searching in stackoverflow, I have found this regex pattern:
/href=['"]([^'"]+?)['"]/
it gets all href
s' values.
Now I need to restrict that pattern to get only doc or docx values.
Note that link may end with additions after .docx
or .doc
.
For example, if I have the link:
<a href="/site/file1.doc?id=1">link1</a>
Result should be:
/site/file1.doc
Upvotes: 0
Views: 413
Reputation: 151
/href=['"]([^'"]+?\.docx?)[^'"]['"]/
check it out here: https://regex101.com/r/oS1cD0/2
Upvotes: 0
Reputation: 424983
Try this:
/href=(['"])([^'"]+\.docx?(\?[^'"]*)?)\1/
This requires that what comes after ".doc" or ".docx" is either the end of the href, or a question mark followed by stuff, ie it won't match "foo.doctor".
This also ensures that the quotes match at each end via a back reference.
See live demo.
Upvotes: 2