Chethan
Chethan

Reputation: 611

perform operation on column of data frame based on condition given to column in another data frame in pandas

I have a data frame df1:

df1 = pd.DataFrame([[40, 23, 22, 31], [12, 3, 11,23], [42, 16, 32, 34], [42, 13, 26, 34]], columns=['A', 'B', 'C', 'D'])

I have another data frame df2:

df2 = pd.DataFrame([["B","<20"],["A",">30"],["C","<40"],["D","<15"]], columns=["Column","Condition"])

Question: Select the data frame df1 based on any of the conditions present in df2. How to do it? Please help.

Expected Output example: For B Condition:

B_df = pd.DataFrame([3,16,13],columns=["B"])

For C Condition:

C_df = pd.DataFrame([22,11,32,26],columns=["C"])

Upvotes: 1

Views: 212

Answers (4)

Gillu13
Gillu13

Reputation: 958

You can use any boolean Series as a condition (provided its index match df1's index), e.g.:

cond = df1["A"]>30

print(pd.DataFrame(df1["A"][cond]))

For convenience, let consider a slightly different form of df2:

df2 = pd.DataFrame([["B","<",20],["A",">",30],["C","<",40],["D","<",15]], columns=["Column","Operator","Value"])

and a dictionary that stores operators:

import operator

operator_dict={
    ">": operator.gt,
    "<": operator.lt
    }

Then the following will do the job:

for index, row in df2.iterrows():
    cond = operator_dict[row["Operator"]](df1[row["Column"]],row["Value"])
    print(pd.DataFrame(df1[row["Column"]][cond]))

Upvotes: 1

Rob Raymond
Rob Raymond

Reputation: 31146

Build a query string and you can flexibly use one or more of the filters.

df1 = pd.DataFrame([[40, 23, 22, 31], [12, 3, 11,23], [42, 16, 32, 34], [42, 13, 26, 34]], columns=['A', 'B', 'C', 'D'])
df2 = pd.DataFrame([["B","<20"],["A",">30"],["C","<40"],["D","<15"]], columns=["Column","Condition"])

def myfilter(col):
    if isinstance(col, str): col = [col]
    mask = df2.Column.isin(col)
    return df1.query(" and ".join([f"{c[0]}{c[1]}" for c in df2.loc[mask].values])).loc[:,col]

print(myfilter("B"))
print(myfilter("C"))
print(myfilter(["B","C"]))

output

    B
1   3
2  16
3  13
    C
0  22
1  11
2  32
3  26
    B   C
1   3  11
2  16  32
3  13  26

Upvotes: 1

gold_cy
gold_cy

Reputation: 14216

Another approach is to make functions to make this process repeatable.

OPS = {"<": "lt", ">": "gt"}

def get_conditions(df, idx):
    column = df.loc[idx].Column
    condition = df.loc[idx].Condition
    attr = condition[0]
    threshold = int(condition[1:])
    return (column, attr, threshold)

def get_matching(conditions, df):
    col, attr, thresh = conditions
    fn = getattr(pd.Series, OPS[attr])
    mask = fn(df[col], thresh)
    return df.loc[mask, col]


for num in range(len(df2)):
    conds = get_conditions(df2, num)
    print(conds[0])
    print("--------")
    print(get_matching(conds, df1), end='\n\n')


B
--------
1     3
2    16
3    13
Name: B, dtype: int64


A
--------
0    40
2    42
3    42
Name: A, dtype: int64


C
--------
0    22
1    11
2    32
3    26
Name: C, dtype: int64


D
--------
Series([], Name: D, dtype: int64)

Upvotes: 1

Andrej Kesely
Andrej Kesely

Reputation: 195418

You can use DataFrame.query for the task:

for c, cnd in zip(df2['Column'], df2['Condition']):
    print(pd.DataFrame(df1[c], columns=[c]).query('{}{}'.format(c, cnd)))
    print('-' * 80)

Prints four dataframes (based on conditions in df2):

    B
1   3
2  16
3  13
--------------------------------------------------------------------------------
    A
0  40
2  42
3  42
--------------------------------------------------------------------------------
    C
0  22
1  11
2  32
3  26
--------------------------------------------------------------------------------
Empty DataFrame
Columns: [D]
Index: []
--------------------------------------------------------------------------------

Upvotes: 3

Related Questions