Extract the last number in a dataframe with strings in Python

Question

I have dataframe from pandas. I am not sure how to start extracting the information.

The data in the column is like this:

->Company A 100->Company B 60->Company C 80-> Company D
->Company A 100->Company B 53.1->Company C 82.2->Company D 100-> Company G
->Company A 100->Company B 23-> Company D

etc...

The information of the companies are not important, only the last number in each item.

I want to extract the number before the last -> and paste it into another column.

For example, extract 80, 100, and 23, respectively.

80
100
23

I have this

import re
text = '->Company A 100->Company B 60->Company C 80-> Company D'
re.findall(r'\d+', text)[-1]

which gives the right output '80'

But when I do it for a df

re.findall(r'\d+', df['ColumnName'])[-1]

I get: TypeError: expected string or bytes-like object

Mohit Motwani · Accepted Answer

You can try the following if you want the last number of the string:

df['num'] = df['text'].str[::-1].str.extract('(\d+)')
df['num'] = df['num'].str[::-1]
df

           text                                         num
0   -> Company A 100->Company B 60->Company C 80->...   80
1   -> Company A 100->Company B 53.1->Company C 82...   100
2   -> Company A 100->Company B 23-> Company D          23

You can also try this if you only want the number before the last '->':

df['num'] = df['text'].str.split('->').str[-2].str.extract('(\d+)')

           text                                         num
0   -> Company A 100->Company B 60->Company C 80->...   80
1   -> Company A 100->Company B 53.1->Company C 82...   100
2   -> Company A 100->Company B 23-> Company D          23

If you want float numbers too before the last '->':

df['text'].str.split('->').str[-2].str.extract('(\d+(?:\.\d+)?)')

The change is in the regex

Extract the last number in a dataframe with strings in Python

Answers (2)

Related Questions