Reputation: 438
I believe that my problem is really straightforward and there must be a really easy way to solve this issue, however as I am quite new with Python, specially pandas, I could not sort it out by my own.
I made up the following dataframes, which represents a way simpler scenario of what I have been working on. It might be scary the lenght of the explanation, but I am just trying to make it as clear as possible. So, please, consider:
import pandas as pd
data = pd.DataFrame({'Column 1': [300,300,450,500,500,750,600,300, 150],
'Column 2': [100,130,230,200,300,350,600,550,530],
'Column 3': [250, 300, 400, 500, 700,
350, 750, 550, 600]})
df_validate_1 = pd.DataFrame({'Column 1': [1,0,1,1,1,1,1,1, 1],
'Column 2': [1,1,1,1,1,0,0,1,1],
'Column 3': [0, 1, 1, 1, 1,
1, 1, 1, 0]})
df_validate_2 = pd.DataFrame({'Column 1': [1,1,1,1,1,0,1,1,1],
'Column 2': [1,1,0,1,1,1,0,0,0],
'Column 3': [0, 0, 1, 1, 0,
1, 1, 0, 0]})
df_validate_3 = pd.DataFrame({'Column 1': [1,1,1,0,0,1,1,1,1],
'Column 2': [1,1,1,1,1,1,1,0,0],
'Column 3': [0, 0, 0, 1, 1,
1, 1, 1, 1]})
I have one main dataframe, and 3 different dataframes for validation, which are only 1 or 0 (true or false). What I am looking for is to find a way in which I will creat one dataframe out of the dataframe 'data' which consists of a conditional of the other three validate dataframes, where I will count only the elements out of the 'data' dataframe whose the respective column and row correspond of 1 for the three other dataframes, if there is one zero in any of the respective position on the validate dataframes, I will consider it NaN.
Explaining briefly what I want in pratice: The first element of the column 2 in the 'data' dataframe is 100. For the df_validate_1, df_validate_2 and df_validate_3 the first element of the second columns is 1. Hence I will pass the value to my new dataframe. On the other hand, the second element of the column 1 is 300, however for that case the element on the respective position at the df_validate_1 is 0 (even if for the same position the df_validate_2 and df_validate_3 is 1), therefore, I want to write it as a NaN.
I am expecting the following results:
Column 1 Column 2 Column 3
0 300 100 NaN
1 NaN 130 NaN
2 450 NaN NaN
3 NaN 200 500
4 NaN 300 NaN
5 NaN NaN 350
6 600 NaN 750
7 300 NaN NaN
8 150 NaN NaN
Naturally I am facing a way more complex solution and this is a made up example. Henceforth, I am looking for a elegant solution which I can apply in a more robust scenario.
Hope that I managed to be succinct and precise. I would really appreciate your help on this one!
Upvotes: 1
Views: 251
Reputation: 889
You can use np.where()
:
>>> result_df = pd.DataFrame()
>>> result_df['Column 1'] = np.where((df_validate_1['Column 1'] == 1) & (df_validate_2['Column 1'] == 1) & (df_validate_3['Column 1'] == 1), data['Column 1'], [np.nan])
>>> result_df['Column 2'] = np.where((df_validate_1['Column 2'] == 1) & (df_validate_2['Column 2'] == 1) & (df_validate_3['Column 2'] == 1), data['Column 2'], [np.nan])
>>> result_df['Column 3'] = np.where((df_validate_1['Column 3'] == 1) & (df_validate_2['Column 3'] == 1) & (df_validate_3['Column 3'] == 1), data['Column 3'], [np.nan])
>>> result_df
Column 1 Column 2 Column 3
0 300.0 100.0 NaN
1 NaN 130.0 NaN
2 450.0 NaN NaN
3 NaN 200.0 500.0
4 NaN 300.0 NaN
5 NaN NaN 350.0
6 600.0 NaN 750.0
7 300.0 NaN NaN
8 150.0 NaN NaN
Upvotes: 0
Reputation: 1167
I'm using pd.concat, then using a groupby to get the min value per index which will represent either 0 or 1. This can be used as a mask for the original df.
dfm = pd.concat([df1,df2,df3])
df = df[dfm.groupby(dfm.index).min().astype(bool)]
print(df)
Column 1 Column 2 Column 3
0 300.0 100.0 NaN
1 NaN 130.0 NaN
2 450.0 NaN NaN
3 NaN 200.0 500.0
4 NaN 300.0 NaN
5 NaN NaN 350.0
6 600.0 NaN 750.0
7 300.0 NaN NaN
8 150.0 NaN NaN
Upvotes: 1
Reputation: 62383
0
or replace them with np.nan
import pandas as pd
import numpy as np
data_updated = data * df_validate_1 * df_validate_2 * df_validate_3
data_updated.replace(0, np.NaN, inplace=True)
Column 1 Column 2 Column 3
300.0 100.0 NaN
NaN 130.0 NaN
450.0 NaN NaN
NaN 200.0 500.0
NaN 300.0 NaN
NaN NaN 350.0
600.0 NaN 750.0
300.0 NaN NaN
150.0 NaN NaN
Upvotes: 1