Reputation: 593
Hi I have a pandas dataframe, I have a column A.
data = pandas.DataFrame()
data['A']= [numpy.random.choice([1,2,3,4,5,6]) for i in range(10)]
I want to add a column B condition on A (when A =1 then B=0, when A>5 then B=1) instead of using:
data.loc[data['A']==1,'B']=0
data.loc[data['A']>5, 'B']=1
Here I want to create a function to do this given the condition as a dict: {'A=1':0,'A>5':1} so I could use add_column({'A=1':0,'A>5':1}, 'B') to do the code above. I am thinking it is tricky to deal with the operators, any good idea?
def add_column(condition_dict, NewColumnName):
pass
Upvotes: 4
Views: 1376
Reputation: 18208
While there may be efficient ways to do it, one possible way might be to use eval
function.
Creating input df
:
import pandas as pd
import numpy as np
data = pd.DataFrame()
data['A']= [np.random.choice([1,2,3,4,5,6]) for i in range(10)]
print(data)
Input df
:
A
0 4
1 3
2 3
3 1
4 1
5 2
6 3
7 6
8 2
9 1
Now, a function is created such that it iterates
through each row of dataframe
and condition_dict
and when row
evaluation matches value
is stored in list for corresponding row which is updated for new column
. If none of the condition matches then with default it will None
:
def add_column(df, condition_dict, NewColumnName):
new_values = []
for index, row in df.iterrows():
# if none of the condition matches then put default value
default_value = None
# iterate through each condition to check if any matches
for key, value in condition_dict.items():
expression = 'row.' + key
if(eval(expression)):
default_value = value
# add corresponding rows new value for new column
new_values.append(default_value)
df[NewColumnName] = new_values
Now, to call the function:
add_column(data, {'A==1':0, 'A>5':1}, 'B')
print(data)
Output:
A B
0 4 NaN
1 3 NaN
2 3 NaN
3 1 0.0
4 1 0.0
5 2 NaN
6 3 NaN
7 6 1.0
8 2 NaN
9 1 0.0
Upvotes: 3