Reputation: 35
I have a dataframe that looks like this.
I want to add a column 'Correct_entry' that returns 'True' or 'False' based on the combination of values in the current columns. In total I have 24 possible combinations that are correct. I give two examples:
1.
If df['Hazard_type'] == 'Drought' & df['Card_type'] == 'Red block' & If['Round'] == 1 & df['Scenario'] == 'scenario A' & df['Payment_type'] == 'One payment (lump sum)'
If df['Hazard_type'] == 'Drought' & df['Card_type'] == 'Green block' & If['Round'] == 1 & df['Scenario'] == 'scenario A' & df['Payment_type'] == 'One payment (lump sum)'
I have 24 different combinations of values that are all correct and should equal 'True'. All other combinations should equal to 'False'.
What would be the best way to go through this set of data? How can I combine all these different statements?
I hope it is clear.
edit: As requested, the data in text format.
Hazard_type Card_type Round Scenario Payment_type
244 Drought Green block 2 scenario B Two payments (two consecutive sums)
643 Drought Red block 4 scenario A Two payments (two consecutive sums)
584 Drought Red block 4 scenario A One payment (lump sum)
242 Drought Red block 2 scenario B Two payments (two consecutive sums)
1039 Drought Green block 6 scenario A Two payments (two consecutive sums)
101 Flood Red block 1 scenario A Two payments (two consecutive sums)
Upvotes: 0
Views: 154
Reputation: 31236
np.nan
for transparencybool
s = 200
df = pd.DataFrame({"Hazard_type":np.random.choice(["Drought","Flood"],s),
"Card_type":np.random.choice(["Red block","Green block"],s),
"Round":np.random.randint(1,7,s),
"Scenario":np.random.choice(["scenario A","scenario B"],s),
"Payment_type":np.random.choice(["One payment (lump sum)","Two payments"],s)})
conditions = [
# condition 0
((df['Hazard_type'] == 'Drought') & (df['Card_type'] == 'Red block') & (df['Round'] == 1)
& (df['Scenario'] == 'scenario A') & (df['Payment_type'] == 'One payment (lump sum)')
),
# condition 1
((df['Hazard_type'] == 'Drought') & (df['Card_type'] == 'Green block') & (df['Round'] == 1) &
(df['Scenario'] == 'scenario A') & (df['Payment_type'] == 'One payment (lump sum)')
)]
df = df.assign(Correct_case=np.select(conditions, [c for c in range(len(conditions))], np.nan),
Correct_entry=lambda dfa: ~dfa.Correct_case.isna())
Hazard_type | Card_type | Round | Scenario | Payment_type | Correct_case | Correct_entry | |
---|---|---|---|---|---|---|---|
47 | Drought | Red block | 1 | scenario A | One payment (lump sum) | 0 | True |
50 | Drought | Red block | 1 | scenario A | One payment (lump sum) | 0 | True |
72 | Drought | Red block | 1 | scenario A | One payment (lump sum) | 0 | True |
104 | Drought | Red block | 1 | scenario A | One payment (lump sum) | 0 | True |
120 | Drought | Red block | 1 | scenario A | One payment (lump sum) | 0 | True |
170 | Drought | Red block | 1 | scenario A | One payment (lump sum) | 0 | True |
186 | Drought | Green block | 1 | scenario A | One payment (lump sum) | 1 | True |
Upvotes: 1
Reputation: 413
I would approach this by creating a series of boolean masks depending on the scenerio
# boolean masks
drought_mask = df['Hazard_type'] == 'Drought'
red_block_mask = df['Card_type'] == 'Red block'
green_block_mask = df['Card_type'] == 'Green block'
round_1_mask = df['Round'] == 1
scenario_a_mask = df['Scenario'] == 'scenario A'
one_payment_mask = df['Payment_type'] == 'One payment (lump sum)'
# scenerio 1
scenario_1_mask = df.loc[(drought_mask & red_block_mask & scenario_a_mask) & (round_1_mask & one_payment_mask & one_payment_mask)]
# scenario 2
scenario_2_mask = df.loc[(drought_mask & green_block_mask & scenario_a_mask) & (round_1_mask & one_payment_mask & one_payment_mask)]
# combining scenerios
df['Correct_entry'] = df.loc[scenario_1_mask | scenario_2_mask]
You can create a mask for every scenario and then combine them in the final correct_entry column using an OR operator (|). This should return a True
value for every row which is matches one of the 24 scenarios.
Upvotes: 0