Mohd Bilal
Mohd Bilal

Reputation: 101

how to add column in dataframe based on some condition from another column?

I have a python data frame.

ID              status

5                  
from 4.3 to 5   yes   
from 6 to 7.2   yes
6.3
6

I want to add another column col, as if status is missing then corresponding value is ID else substring(first integer in string) of ID

result should be like this

ID              status    col

5                         5
from 4.3 to 5   yes       4.3
from 6 to 7.2   yes       6
6.3                       6.3
6                         6

sorry for very bad representation of the question.

Upvotes: 1

Views: 107

Answers (5)

Andrej Kesely
Andrej Kesely

Reputation: 195418

Another method, if you don't prefer regular expressions:

df['col'] = df['ID'].apply(lambda x: x if len(str(x).split()) == 1 else str(x).split()[1])
print(df)

              ID status  col
0              5           5
1  from 4.3 to 5    yes  4.3
2  from 6 to 7.2    yes    6
3            6.3         6.3
4              6           6

Upvotes: 2

mohit kaushik
mohit kaushik

Reputation: 26

this is correct, please check it:

else:
     df.loc[i, 'col'] = re.findall(r'\d+', df.loc[i, 'ID'])[0]

Upvotes: 0

ksh22
ksh22

Reputation: 43

def fun(x,y):
  return (x.split("from ")[1].split(" to ")[0] if pd.notnull(y) else x)

df["sep"]=df.apply(lambda x: fun(x["ID"],x["status"]),axis=1)
df

    ID             status   sep
0   5              None     5
1   from 4.3 to 5   yes     4.3
2   from 6 to 7.2   yes     6
3   6.3            None     6.3
4   6              None     6

Assuming ID column is string

Upvotes: 1

BENY
BENY

Reputation: 323226

Using findall

df.ID.str.findall('[-+]?\d*\.\d+|\d+').str[0]
0      5
1    4.3
2      6
3    6.3
4      6
Name: ID, dtype: object

Upvotes: 3

Bharath_Raja
Bharath_Raja

Reputation: 642

you can loop and check over

import re

df['col'] = None

for i in range(len(df)):
    if df.loc[i, 'status'] == np.NaN:
         df.loc[i, 'col'] = df.loc[i, 'ID']
    else:
         df.loc[i, 'col'] = re.findall(r'\d+', df.loc[i, 'ID'])[0]

Upvotes: 1

Related Questions