Mainland
Mainland

Reputation: 4604

Python Compare two different size dataframes on If condition

I have two data frames. Main and auxiliary. The main data frame is big. Auxiliary is small. I want to compare both and produce True/False in the Main data frame.

My code:

mdf = 
    A  B  C
0   3  0 -2
1  -3 -4 -5
2   4 -3  5
3   1 -8  1

adf = 
    A_low  A_up  B_low  B_up  C_low  C_up
0    -2      2    -6     -2    -4     4

# up limit columns
ul_cols = ['A_up','B_up','C_up']
# low limit columns
ll_cols = ['A_low','B_low','C_low']
# output columns
op_cols = [i+'_op' for i in mdf.columns]
# compute output column
mdf[op_cols] = np.where((mdf<adf[ul_cols])&(mdf>adf[ll_cols]), True, False)

Present output:

ValueError: Can only compare identically-labeled DataFrame objects

Expected output:

mdf = 
    A  B  C  A_op  B_op  C_op  
0   3  0 -2  False False True
1  -3 -4 -5  False True  False
2   4 -3  5  False True  False
3   1 -8  1  True  False True

Upvotes: 1

Views: 141

Answers (2)

Mainland
Mainland

Reputation: 4604

After a series of trials, I landed at this answer. This what I did to get the answer

Problem

mdf = 
    A  B  C
0   3  0 -2
1  -3 -4 -5
2   4 -3  5
3   1 -8  1

adf = 
    A_low  A_up  B_low  B_up  C_low  C_up
0    -2      2    -6     -2    -4     4

# up limit columns
ul_cols = ['A_up','B_up','C_up']
# low limit columns
ll_cols = ['A_low','B_low','C_low']
# output columns
op_cols = [i+'_op' for i in mdf.columns]

My answer:

## adf is one row dataframe. Repeat the rows to match main df
adf = adf.loc[adf.index.repeat(len(mdf))]

# compute output column
mdf[op_cols] = np.where((mdf<adf[ul_cols].values)&(mdf>adf[ll_cols].values), True, False)

This gave me the desired answer

Upvotes: 0

Henry Ecker
Henry Ecker

Reputation: 35686

Try with reformatting adf:

adf.columns = adf.columns.str.split('_', expand=True).swaplevel(0, 1)

adf:

  low up low up low up
    A  A   B  B   C  C
0  -2  2  -6 -2  -4  4

Then with broadcasted comparison to create a new dataframe:

new_df = (adf['low'].to_numpy() <= mdf) & (mdf <= adf['up'].to_numpy())

new_df:

       A      B      C
0  False  False   True
1  False   True  False
2  False   True  False
3   True  False   True

Then join back with suffix:

mdf = mdf.join(new_df, rsuffix='_op')

mdf:

   A  B  C   A_op   B_op   C_op
0  3  0 -2  False  False   True
1 -3 -4 -5  False   True  False
2  4 -3  5  False   True  False
3  1 -8  1   True  False   True

All Together:

adf.columns = adf.columns.str.split('_', expand=True).swaplevel(0, 1)
mdf = mdf.join((adf['low'].to_numpy() <= mdf) &
               (mdf <= adf['up'].to_numpy()), rsuffix='_op')

Another option with filter instead of creating a multi-index:

mdf = mdf.join((adf.filter(like='_low').to_numpy() <= mdf) &
               (mdf <= adf.filter(like='_up').to_numpy()), rsuffix='_op')

mdf:

   A  B  C   A_op   B_op   C_op
0  3  0 -2  False  False   True
1 -3 -4 -5  False   True  False
2  4 -3  5  False   True  False
3  1 -8  1   True  False   True

DataFrames Used:

mdf = pd.DataFrame({'A': [3, -3, 4, 1],
                    'B': [0, -4, -3, -8],
                    'C': [-2, -5, 5, 1]})
adf = pd.DataFrame({'A_low': [-2], 'A_up': [2],
                    'B_low': [-6], 'B_up': [-2],
                    'C_low': [-4], 'C_up': [4]})

Upvotes: 1

Related Questions