kms
kms

Reputation: 2024

Extract substring from urls stored in a pandas column

Pandas column contains a series of urls. I'd like to extract a substring from the url. MRE code below.

s = pd.Series(['https://url-location/img/xxxyyy_image1.png'])

s.apply(lambda x: x[x.find("/")+1:st.find("_")])

I'd like to extract xxxyyy and store them into a new column.

Upvotes: 2

Views: 576

Answers (2)

Andreas
Andreas

Reputation: 9197

Also possible:

s.str.split('/').str[-1].str.split('_').str[0]
# Out[224]: xxxyyy

This works, because .str allows for the slice annotation. So .str[-1] will provide the last element after the split for example.

Upvotes: 1

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626932

You can use

>>> s.str.extract(r'.*/([^_]+)')
        0
0  xxxyyy

See the regex demo. Details:

  • .* - zero or more chars other than line break chars as many as possible
  • / - a slash
  • ([^_]+) - Capturing group 1 (the value captured into this group will be the actual return value of Series.str.extract): one or more chars other than _ char.

Upvotes: 3

Related Questions