Jossy
Jossy

Reputation: 999

How to get all text after the second to last instance of a character excluding the last instance?

I have the following partial URL:

"/tennis/qatar/atp-doha-2009/hernych-jan-monfils-gael-S8Lm3D4l/"

I'd like to get everything after the second to last / and excluding the final / so:

"hernych-jan-monfils-gael-S8Lm3D4l"

I've got as far as:

re.search(r".*/(.*?/.*)", url)

Which gets me:

"hernych-jan-monfils-gael-S8Lm3D4l/"

But I can't figure out how to get rid of the final slash. Could someone point me in the right direction?

Upvotes: 1

Views: 775

Answers (4)

Lisa
Lisa

Reputation: 204

For a more pythonic approach, you can also use:

"/tennis/qatar/atp-doha-2009/hernych-jan-monfils-gael-S8Lm3D4l/".split('/')[-2]

str.split outputs a list of words split using the provided delimiter (in this case, '/'). So to break down the above statement,

s = "/tennis/qatar/atp-doha-2009/hernych-jan-monfils-gael-S8Lm3D4l/".split('/')

print(s)
>> ['', 'tennis', 'qatar', 'atp-doha-2009', 'hernych-jan-monfils-gael-S8Lm3D4l', '']

print(s[-2])
>> 'hernych-jan-monfils-gael-S8Lm3D4l'

Upvotes: 1

sTonystork
sTonystork

Reputation: 117

This would require another step but you could essentially make your string an object, and from there you could just use list comprehension to remove the last character from the string.

url = "hernych-jan-monfils-gael-S8Lm3D4l/"

url = url[:-1] That would remove your string but I'm sure there are other ways that could do it in 1 line

Upvotes: 0

vvvvv
vvvvv

Reputation: 31619

You can do it like this:

s = "/tennis/qatar/atp-doha-2009/hernych-jan-monfils-gael-S8Lm3D4l/" 
s = "/".join(s.split("/")[-2:])  # Equivalent to your regex, with replace
s = s.rstrip("/")  # to remove the last slash

Upvotes: 1

Ryszard Czech
Ryszard Czech

Reputation: 18611

Use

^.*?/([^/]*)/?$

See proof.

Explanation

--------------------------------------------------------------------------------
  ^                        the beginning of the string
--------------------------------------------------------------------------------
  .*?                      any character except \n (0 or more times
                           (matching the least amount possible))
--------------------------------------------------------------------------------
  /                        '/'
--------------------------------------------------------------------------------
  (                        group and capture to \1:
--------------------------------------------------------------------------------
    [^/]*                    any character except: '/' (0 or more
                             times (matching the most amount
                             possible))
--------------------------------------------------------------------------------
  )                        end of \1
--------------------------------------------------------------------------------
  /?                       '/' (optional (matching the most amount
                           possible))
--------------------------------------------------------------------------------
  $                        before an optional \n, and the end of the
                           string

Python code:

import re
regex = r"^.*?/([^/]*)/?$"
text = "/tennis/qatar/atp-doha-2009/hernych-jan-monfils-gael-S8Lm3D4l/"
print(re.findall(regex, text))

Result: ['hernych-jan-monfils-gael-S8Lm3D4l']

Upvotes: 3

Related Questions