Sid
Sid

Reputation: 23

Pandas calculate a new column from multiple other columns and subset of rows

I'm curious to see how do we calculate a new column from a dataframe based on other existing columns and subset of rows.

Example: I have a dataframe with different rooms 111, 222, 333, 444 and each room might have beds AAA, BBB. And I also have bed status per room as occupied or Free. I wanted to calculate a new column - "Active" with following criteria:

  1. Only one bed per room is active.
  2. Occupied bed takes first preference to be active in a room.
  3. If room has only one bed it must be active irrespective of bed status

Sample DataFrame:

   Room  Bed    Status
0   111  AAA  Occupied
1   111  BBB      Free
2   222  AAA      Free
3   333  AAA      Free
4   333  BBB  Occupied
5   444  BBB  Occupied

Expected Output:

   Room  Bed    Status  Active
0   111  AAA  Occupied    True
1   111  BBB      Free   False
2   222  AAA      Free    True
3   333  AAA      Free   False
4   333  BBB  Occupied    True
5   444  BBB  Occupied    True

I wanted to convert this to a dictionary and loop through it but have a strong feeling that this could be implemented through pandas in-built functions.

TIA

Upvotes: 2

Views: 129

Answers (1)

jezrael
jezrael

Reputation: 863166

Use:

print (df)
   Room  Bed    Status
0   111  AAA  Occupied
1   111  BBB      Free
2   222  AAA      Free
3   333  AAA      Free
4   333  BBB  Occupied
5   444  BBB  Occupied
6   555  AAA  Occupied
7   555  BBB  Occupied

#test Occupied
m1 = df['Status'].eq('Occupied')
#test first active if multiple active per Room (555)
m2 = (~df.assign(Status = df['Status'].where(df['Status'].eq('Occupied')))
         .duplicated(['Room','Status']))
#test if one bed room
m3 = df['Room'].map(df['Room'].value_counts()).eq(1)

#chain masks with & for bitwise AND and | for bitwise OR
df['Active'] = m1 & m2 | m3
print (df)
   Room  Bed    Status  Active
0   111  AAA  Occupied    True
1   111  BBB      Free   False
2   222  AAA      Free    True
3   333  AAA      Free   False
4   333  BBB  Occupied    True
5   444  BBB  Occupied    True
6   555  AAA  Occupied    True
7   555  BBB  Occupied   False

EDIT:

#test Occupied
m1 = df['Status'].eq('Occupied')
#test first active if multiple active per Room (555)
m2 = (~df.assign(Status = df['Status'].where(df['Status'].eq('Occupied')))
         .duplicated(['Room','Status']))

#test Free
m4 = df['Status'].eq('Free')
#test first active if multiple active per Room (666)
m5 = (~df.assign(Status = df['Status'].where(m4))
         .duplicated(['Room','Status']))

#test if all bed room has Free
m6 = m4.groupby(df['Room']).transform('all')

#chain masks with & for bitwise AND and | for bitwise OR
df['Active'] =  (m5 & m6) | (m1 & m2)
print (df)
   Room  Bed    Status  Active
0   111  AAA  Occupied    True
1   111  BBB      Free   False
2   222  AAA      Free    True
3   333  AAA      Free   False
4   333  BBB  Occupied    True
5   444  BBB  Occupied    True
6   555  AAA  Occupied    True
7   555  BBB  Occupied   False
8   666  AAA      Free    True
9   666  BBB      Free   False

Upvotes: 1

Related Questions