Reputation: 4604
I have two data frames. Main and auxiliary. The main data frame is big. Auxiliary is small. I want to compare both and produce True/False in the Main data frame.
My code:
mdf =
A B C
0 3 0 -2
1 -3 -4 -5
2 4 -3 5
3 1 -8 1
adf =
A_low A_up B_low B_up C_low C_up
0 -2 2 -6 -2 -4 4
# up limit columns
ul_cols = ['A_up','B_up','C_up']
# low limit columns
ll_cols = ['A_low','B_low','C_low']
# output columns
op_cols = [i+'_op' for i in mdf.columns]
# compute output column
mdf[op_cols] = np.where((mdf<adf[ul_cols])&(mdf>adf[ll_cols]), True, False)
Present output:
ValueError: Can only compare identically-labeled DataFrame objects
Expected output:
mdf =
A B C A_op B_op C_op
0 3 0 -2 False False True
1 -3 -4 -5 False True False
2 4 -3 5 False True False
3 1 -8 1 True False True
Upvotes: 1
Views: 141
Reputation: 4604
After a series of trials, I landed at this answer. This what I did to get the answer
Problem
mdf =
A B C
0 3 0 -2
1 -3 -4 -5
2 4 -3 5
3 1 -8 1
adf =
A_low A_up B_low B_up C_low C_up
0 -2 2 -6 -2 -4 4
# up limit columns
ul_cols = ['A_up','B_up','C_up']
# low limit columns
ll_cols = ['A_low','B_low','C_low']
# output columns
op_cols = [i+'_op' for i in mdf.columns]
My answer:
## adf is one row dataframe. Repeat the rows to match main df
adf = adf.loc[adf.index.repeat(len(mdf))]
# compute output column
mdf[op_cols] = np.where((mdf<adf[ul_cols].values)&(mdf>adf[ll_cols].values), True, False)
This gave me the desired answer
Upvotes: 0
Reputation: 35686
Try with reformatting adf
:
adf.columns = adf.columns.str.split('_', expand=True).swaplevel(0, 1)
adf
:
low up low up low up
A A B B C C
0 -2 2 -6 -2 -4 4
Then with broadcasted comparison to create a new dataframe:
new_df = (adf['low'].to_numpy() <= mdf) & (mdf <= adf['up'].to_numpy())
new_df
:
A B C
0 False False True
1 False True False
2 False True False
3 True False True
Then join
back with suffix:
mdf = mdf.join(new_df, rsuffix='_op')
mdf
:
A B C A_op B_op C_op
0 3 0 -2 False False True
1 -3 -4 -5 False True False
2 4 -3 5 False True False
3 1 -8 1 True False True
All Together:
adf.columns = adf.columns.str.split('_', expand=True).swaplevel(0, 1)
mdf = mdf.join((adf['low'].to_numpy() <= mdf) &
(mdf <= adf['up'].to_numpy()), rsuffix='_op')
Another option with filter
instead of creating a multi-index:
mdf = mdf.join((adf.filter(like='_low').to_numpy() <= mdf) &
(mdf <= adf.filter(like='_up').to_numpy()), rsuffix='_op')
mdf
:
A B C A_op B_op C_op
0 3 0 -2 False False True
1 -3 -4 -5 False True False
2 4 -3 5 False True False
3 1 -8 1 True False True
DataFrames Used:
mdf = pd.DataFrame({'A': [3, -3, 4, 1],
'B': [0, -4, -3, -8],
'C': [-2, -5, 5, 1]})
adf = pd.DataFrame({'A_low': [-2], 'A_up': [2],
'B_low': [-6], 'B_up': [-2],
'C_low': [-4], 'C_up': [4]})
Upvotes: 1