Daneel Olivaw
Daneel Olivaw

Reputation: 51

Define a new column in pandas dataframe dynamically

Say I define the following dataframe:

df = pd.DataFrame({'AAA' : [4,5,6,7], 'BBB' : [10,20,30,40],'CCC' : [100,50,-30,-50]})
print(df.head())

and I want to generate a new column by using some condition on the existing column(s)

df['XXX'] = df.AAA < 6
print(df.head())

where the output generated would be:

     AAA BBB CCC  XXX
0    4   10  100  True
1    5   20   50  True
2    6   30  -30  False
3    7   40  -50  False

I'd like to be able to generate these conditions dynamically, for example by passing something like

"AAA < 6"

or

"AAA < 4 & BBB == 10"

and have these conditions evaluated accordingly.

Does pandas provide some mechanism to allow something like this? I know eval() could be an alternative, but also note that the above strings would also need to be given the df as a context, ie: "AAA < 6" => "df.AAA < 6"

Alternatively, could someone recommend me a package which would allow me to define these conditions and then generate a lambda or similar which could be used to evaluate?

Thank you

Upvotes: 3

Views: 239

Answers (2)

jezrael
jezrael

Reputation: 862511

Use DataFrame.eval:

df = pd.DataFrame({'AAA' : [4,5,6,7], 'BBB' : [10,20,30,40],'CCC' : [100,50,-30,-50]})
df['a'] = df.eval("AAA < 5 & BBB == 10")
df['b'] = df.eval("AAA < 6")
df['c'] = df.eval("AAA < 4 & BBB == 10")
print (df)

   AAA  BBB  CCC      a      b      c
0    4   10  100   True   True  False
1    5   20   50  False   True  False
2    6   30  -30  False  False  False
3    7   40  -50  False  False  False

More information about using this function with alternatives is Dynamic Expression Evaluation in pandas using pd.eval()

Upvotes: 3

yatu
yatu

Reputation: 88226

You could use pandas' eval, which can evaluate a string describing operations:

cond = "AAA < 6"
df.eval(cond)

0     True
1     True
2    False
3    False
Name: AAA, dtype: bool

cond = "AAA < 6"
df.assign(XXX = df.eval(cond))

 AAA  BBB  CCC    XXX
0    4   10  100   True
1    5   20   50   True
2    6   30  -30  False
3    7   40  -50  False

cond = "AAA < 4 & BBB == 10"
df.assign(XXX = df.eval(cond))

AAA  BBB  CCC    XXX
0    4   10  100  False
1    5   20   50  False
2    6   30  -30  False
3    7   40  -50  False

Upvotes: 3

Related Questions