Reputation: 101
I have a python data frame.
ID status
5
from 4.3 to 5 yes
from 6 to 7.2 yes
6.3
6
I want to add another column col, as if status is missing then corresponding value is ID else substring(first integer in string) of ID
result should be like this
ID status col
5 5
from 4.3 to 5 yes 4.3
from 6 to 7.2 yes 6
6.3 6.3
6 6
sorry for very bad representation of the question.
Upvotes: 1
Views: 107
Reputation: 195418
Another method, if you don't prefer regular expressions:
df['col'] = df['ID'].apply(lambda x: x if len(str(x).split()) == 1 else str(x).split()[1])
print(df)
ID status col
0 5 5
1 from 4.3 to 5 yes 4.3
2 from 6 to 7.2 yes 6
3 6.3 6.3
4 6 6
Upvotes: 2
Reputation: 26
this is correct, please check it:
else:
df.loc[i, 'col'] = re.findall(r'\d+', df.loc[i, 'ID'])[0]
Upvotes: 0
Reputation: 43
def fun(x,y):
return (x.split("from ")[1].split(" to ")[0] if pd.notnull(y) else x)
df["sep"]=df.apply(lambda x: fun(x["ID"],x["status"]),axis=1)
df
ID status sep
0 5 None 5
1 from 4.3 to 5 yes 4.3
2 from 6 to 7.2 yes 6
3 6.3 None 6.3
4 6 None 6
Assuming ID
column is string
Upvotes: 1
Reputation: 323226
Using findall
df.ID.str.findall('[-+]?\d*\.\d+|\d+').str[0]
0 5
1 4.3
2 6
3 6.3
4 6
Name: ID, dtype: object
Upvotes: 3
Reputation: 642
you can loop and check over
import re
df['col'] = None
for i in range(len(df)):
if df.loc[i, 'status'] == np.NaN:
df.loc[i, 'col'] = df.loc[i, 'ID']
else:
df.loc[i, 'col'] = re.findall(r'\d+', df.loc[i, 'ID'])[0]
Upvotes: 1