Reputation: 8531
I am trying to capture with Javascript regex any string between my domain and .html
(if present), but am having trouble doing so. Any advice?
Regex:
www\.mysite\.com\/(.*)(\.html) // Does not capture 'www.mysite.com/cat'
www\.mysite\.com\/(.*)(\.html)? // Captures the '.html'
Test Data:
www.mysite.com/aadvark.html (capture group should be 'aadvark')
www.mysite.com/bird.html (capture group should be 'bird')
www.mysite.com/cat (capture group should be 'cat')
Upvotes: 1
Views: 18
Reputation: 20486
A lot of issues like this can be fixed by being more specific with your dot-match-all. If you change your .*
to [^.]*
(0+ non-.
characters), you'll get your expected results.
/www\.mysite\.com\/([^.]*)(\.html)?/
This is because when you make (\.html)
optional, the .*
greedily continues to the end. This could also be fixed by using ?
to make your repetition "lazy" (stops as soon as the next part of the expression matches); however, then you'd need to anchor the end of the expression with a $
.
/www\.mysite\.com\/(.*?)(\.html)?$/
I'd recommend this first. But, the second is more encompassing by matching things like foo.bar
in www.mysite.com/foo.bar.html
.
Upvotes: 1