yabchexu
yabchexu

Reputation: 593

parse string as conditions to filter pandas dataframe

Hi I have a pandas dataframe, I have a column A.

data = pandas.DataFrame()
data['A']= [numpy.random.choice([1,2,3,4,5,6]) for i in range(10)]

I want to add a column B condition on A (when A =1 then B=0, when A>5 then B=1) instead of using:

data.loc[data['A']==1,'B']=0
data.loc[data['A']>5, 'B']=1

Here I want to create a function to do this given the condition as a dict: {'A=1':0,'A>5':1} so I could use add_column({'A=1':0,'A>5':1}, 'B') to do the code above. I am thinking it is tricky to deal with the operators, any good idea?

def add_column(condition_dict, NewColumnName):
    pass

Upvotes: 4

Views: 1376

Answers (1)

niraj
niraj

Reputation: 18208

While there may be efficient ways to do it, one possible way might be to use eval function.

Creating input df:

import pandas as pd
import numpy as np

data = pd.DataFrame()
data['A']= [np.random.choice([1,2,3,4,5,6]) for i in range(10)]
print(data)

Input df:

   A
0  4
1  3
2  3
3  1
4  1
5  2
6  3
7  6
8  2
9  1

Now, a function is created such that it iterates through each row of dataframe and condition_dict and when row evaluation matches value is stored in list for corresponding row which is updated for new column. If none of the condition matches then with default it will None:

def add_column(df, condition_dict, NewColumnName):
    new_values = []
    for index, row in df.iterrows():
        # if none of the condition matches then put default value
        default_value = None
        # iterate through each condition to check if any matches
        for key, value in condition_dict.items():
            expression = 'row.' + key
            if(eval(expression)):
                default_value = value
        # add corresponding rows new value for new column        
        new_values.append(default_value)

    df[NewColumnName] = new_values

Now, to call the function:

add_column(data, {'A==1':0, 'A>5':1}, 'B')
print(data)

Output:

   A    B
0  4  NaN
1  3  NaN
2  3  NaN
3  1  0.0
4  1  0.0
5  2  NaN
6  3  NaN
7  6  1.0
8  2  NaN
9  1  0.0

Upvotes: 3

Related Questions