Reputation: 628
I have dataframe from pandas
. I am not sure how to start extracting the information.
The data in the column is like this:
->Company A 100->Company B 60->Company C 80-> Company D
->Company A 100->Company B 53.1->Company C 82.2->Company D 100-> Company G
->Company A 100->Company B 23-> Company D
etc...
The information of the companies are not important, only the last number in each item.
I want to extract the number before the last ->
and paste it into another column.
For example, extract 80
, 100
, and 23
, respectively.
80
100
23
I have this
import re
text = '->Company A 100->Company B 60->Company C 80-> Company D'
re.findall(r'\d+', text)[-1]
which gives the right output '80'
But when I do it for a df
re.findall(r'\d+', df['ColumnName'])[-1]
I get: TypeError: expected string or bytes-like object
Upvotes: 2
Views: 2496
Reputation: 4792
You can try the following if you want the last number of the string:
df['num'] = df['text'].str[::-1].str.extract('(\d+)')
df['num'] = df['num'].str[::-1]
df
text num
0 -> Company A 100->Company B 60->Company C 80->... 80
1 -> Company A 100->Company B 53.1->Company C 82... 100
2 -> Company A 100->Company B 23-> Company D 23
You can also try this if you only want the number before the last '->':
df['num'] = df['text'].str.split('->').str[-2].str.extract('(\d+)')
text num
0 -> Company A 100->Company B 60->Company C 80->... 80
1 -> Company A 100->Company B 53.1->Company C 82... 100
2 -> Company A 100->Company B 23-> Company D 23
If you want float numbers too before the last '->':
df['text'].str.split('->').str[-2].str.extract('(\d+(?:\.\d+)?)')
The change is in the regex
Upvotes: 3
Reputation: 862671
Use regex for get floats or integers with findall
and get last values by indexing:
df['num'] = df['col'].str.findall(r'([-+]?\d*\.\d+|\d+)').str[-1]
print (df)
col num
0 -> Company A 100->Company B 60->Company C 80->... 80
1 -> Company A 100->Company B 53.1->Company C 82... 100
2 -> Company A 100->Company B 23-> Company D 23
Upvotes: 3