Reputation: 628
I want to extract all the numbers before the symbols ->
. For now I only have this:
df['New'] = df['Companies'].str.findall(r'(\d+(?:\.\d+)?)').str[-1]
which only extracts the numbers before the last ->
I modified it slightly to this:
df['New'] = df['Companies'].str.findall(r'(\d+(?:\.\d+)?)')
but I didn't get what I wanted, instead I want something similar like this:
Companies New New2 New3
0 -> Company A 100->Company B 60->Company C 80->... 100 60 80
1 -> Company A 100->Company B 53.1->Company C 82... 100 53.1 82
2 -> Company A 100->Company B 23-> Company D 100 23
3 -> Company 1 100->Company B 30-> Company D 100 30
Note that the New's
can be more than 3 columns depending of how many ->
there are in the strings. Also, some of the Company
names have integers in their names, which I do not want to include in the new columns.
Could you help me with this?
Upvotes: 1
Views: 85
Reputation: 863341
Use Series.str.extractall
with Series.unstack
and DataFrame.add_prefix
with catch integer or float
s before ->
:
pat = r'(\d*\.\d+|\d+\.?)->'
df = df.join(df['Companies'].str.extractall(pat)[0].unstack().add_prefix('New'))
print (df)
Companies New0 New1 New2
0 -> Company A 100->Company B 60->Company C 80-> 100 60 80
1 -> Company A 100->Company B 53.1->Company C 82 100 53.1 NaN
2 -> Company A 100->Company B 23-> Company D ... 100 23 NaN
3 -> Company 1 100->Company B 30-> Company D 100 30 NaN
If need floats:
df = df.join(df['Companies'].str.extractall(pat)[0].astype(float).unstack().add_prefix('New'))
print (df)
Companies New0 New1 New2
0 -> Company A 100->Company B 60->Company C 80-> 100.0 60.0 80.0
1 -> Company A 100->Company B 53.1->Company C 82 100.0 53.1 NaN
2 -> Company A 100->Company B 23-> Company D ... 100.0 23.0 NaN
3 -> Company 1 100->Company B 30-> Company D 100.0 30.0 NaN
Upvotes: 1