Jon
Jon

Reputation: 8531

Regex JS - Create Capture Pattern Without Specified Word

I am trying to capture with Javascript regex any string between my domain and .html (if present), but am having trouble doing so. Any advice?

Regex:
www\.mysite\.com\/(.*)(\.html)    // Does not capture 'www.mysite.com/cat'
www\.mysite\.com\/(.*)(\.html)?   // Captures the '.html'

Test Data:
www.mysite.com/aadvark.html      (capture group should be 'aadvark')
www.mysite.com/bird.html         (capture group should be 'bird')
www.mysite.com/cat               (capture group should be 'cat')

Upvotes: 1

Views: 18

Answers (1)

Sam
Sam

Reputation: 20486

A lot of issues like this can be fixed by being more specific with your dot-match-all. If you change your .* to [^.]* (0+ non-. characters), you'll get your expected results.

/www\.mysite\.com\/([^.]*)(\.html)?/

This is because when you make (\.html) optional, the .* greedily continues to the end. This could also be fixed by using ? to make your repetition "lazy" (stops as soon as the next part of the expression matches); however, then you'd need to anchor the end of the expression with a $.

/www\.mysite\.com\/(.*?)(\.html)?$/

I'd recommend this first. But, the second is more encompassing by matching things like foo.bar in www.mysite.com/foo.bar.html.

Upvotes: 1

Related Questions