CptNemo
CptNemo

Reputation: 6755

Remove ending of string with gsub

I have two possible endings for my string. The first with no numbers:

http://www.something.com/test.html

the second with numbers (up to two digits)

http://www.something.com/test-1.html
http://www.something.com/test-2.html
http://www.something.com/test-3.html
http://www.something.com/test-4.html
http://www.something.com/test-15.html

I need to strip the .html from the first case and -1.html (or whatever number) from the second. The idea is to make the two string comparable to find duplicates.

I think the following should manage the second case

gsub("-[0-9]|[1-9][0-9].html", "", string)

but is it possible to have a function to manage both cases?

Upvotes: 0

Views: 1825

Answers (1)

Jerry
Jerry

Reputation: 71538

You can perhaps use something like this:

(-[0-9]+)?\\.html

Note that it's safer to escape the dot because an unescaped dot will match any character.

regex101 demo

Upvotes: 2

Related Questions