Reputation: 23
I'm curious to see how do we calculate a new column from a dataframe based on other existing columns and subset of rows.
Example: I have a dataframe with different rooms 111, 222, 333, 444 and each room might have beds AAA, BBB. And I also have bed status per room as occupied or Free. I wanted to calculate a new column - "Active" with following criteria:
Sample DataFrame:
Room Bed Status
0 111 AAA Occupied
1 111 BBB Free
2 222 AAA Free
3 333 AAA Free
4 333 BBB Occupied
5 444 BBB Occupied
Expected Output:
Room Bed Status Active
0 111 AAA Occupied True
1 111 BBB Free False
2 222 AAA Free True
3 333 AAA Free False
4 333 BBB Occupied True
5 444 BBB Occupied True
I wanted to convert this to a dictionary and loop through it but have a strong feeling that this could be implemented through pandas in-built functions.
TIA
Upvotes: 2
Views: 129
Reputation: 863166
Use:
print (df)
Room Bed Status
0 111 AAA Occupied
1 111 BBB Free
2 222 AAA Free
3 333 AAA Free
4 333 BBB Occupied
5 444 BBB Occupied
6 555 AAA Occupied
7 555 BBB Occupied
#test Occupied
m1 = df['Status'].eq('Occupied')
#test first active if multiple active per Room (555)
m2 = (~df.assign(Status = df['Status'].where(df['Status'].eq('Occupied')))
.duplicated(['Room','Status']))
#test if one bed room
m3 = df['Room'].map(df['Room'].value_counts()).eq(1)
#chain masks with & for bitwise AND and | for bitwise OR
df['Active'] = m1 & m2 | m3
print (df)
Room Bed Status Active
0 111 AAA Occupied True
1 111 BBB Free False
2 222 AAA Free True
3 333 AAA Free False
4 333 BBB Occupied True
5 444 BBB Occupied True
6 555 AAA Occupied True
7 555 BBB Occupied False
EDIT:
#test Occupied
m1 = df['Status'].eq('Occupied')
#test first active if multiple active per Room (555)
m2 = (~df.assign(Status = df['Status'].where(df['Status'].eq('Occupied')))
.duplicated(['Room','Status']))
#test Free
m4 = df['Status'].eq('Free')
#test first active if multiple active per Room (666)
m5 = (~df.assign(Status = df['Status'].where(m4))
.duplicated(['Room','Status']))
#test if all bed room has Free
m6 = m4.groupby(df['Room']).transform('all')
#chain masks with & for bitwise AND and | for bitwise OR
df['Active'] = (m5 & m6) | (m1 & m2)
print (df)
Room Bed Status Active
0 111 AAA Occupied True
1 111 BBB Free False
2 222 AAA Free True
3 333 AAA Free False
4 333 BBB Occupied True
5 444 BBB Occupied True
6 555 AAA Occupied True
7 555 BBB Occupied False
8 666 AAA Free True
9 666 BBB Free False
Upvotes: 1