Mataunited18
Mataunited18

Reputation: 628

Extract the last number in a dataframe with strings in Python

I have dataframe from pandas. I am not sure how to start extracting the information.

The data in the column is like this:

->Company A 100->Company B 60->Company C 80-> Company D
->Company A 100->Company B 53.1->Company C 82.2->Company D 100-> Company G
->Company A 100->Company B 23-> Company D

etc...

The information of the companies are not important, only the last number in each item.

I want to extract the number before the last -> and paste it into another column.

For example, extract 80, 100, and 23, respectively.

80
100
23

I have this

import re
text = '->Company A 100->Company B 60->Company C 80-> Company D'
re.findall(r'\d+', text)[-1]

which gives the right output '80'

But when I do it for a df

re.findall(r'\d+', df['ColumnName'])[-1]

I get: TypeError: expected string or bytes-like object

Upvotes: 2

Views: 2496

Answers (2)

Mohit Motwani
Mohit Motwani

Reputation: 4792

You can try the following if you want the last number of the string:

df['num'] = df['text'].str[::-1].str.extract('(\d+)')
df['num'] = df['num'].str[::-1]
df

           text                                         num
0   -> Company A 100->Company B 60->Company C 80->...   80
1   -> Company A 100->Company B 53.1->Company C 82...   100
2   -> Company A 100->Company B 23-> Company D          23

You can also try this if you only want the number before the last '->':

df['num'] = df['text'].str.split('->').str[-2].str.extract('(\d+)')

           text                                         num
0   -> Company A 100->Company B 60->Company C 80->...   80
1   -> Company A 100->Company B 53.1->Company C 82...   100
2   -> Company A 100->Company B 23-> Company D          23

If you want float numbers too before the last '->':

df['text'].str.split('->').str[-2].str.extract('(\d+(?:\.\d+)?)')

The change is in the regex

Upvotes: 3

jezrael
jezrael

Reputation: 862671

Use regex for get floats or integers with findall and get last values by indexing:

df['num'] = df['col'].str.findall(r'([-+]?\d*\.\d+|\d+)').str[-1]
print (df)


                                                 col  num
0  -> Company A 100->Company B 60->Company C 80->...   80
1  -> Company A 100->Company B 53.1->Company C 82...  100
2         -> Company A 100->Company B 23-> Company D   23

Upvotes: 3

Related Questions