TLanni
TLanni

Reputation: 340

pandas groupby with condition on one column to populate another column

I have a following dataframe df_address containing addresses of students

student_id     address_type     Address          City      
 1                R              6th street      MPLS              
 1                P              10th street SE  Chicago           
 1                E              10th street SE  Chicago           
 2                P              Washington ST   Boston            
 2                E              Essex St        NYC               
 3                E              1040 Taft Blvd  Dallas            
 4                R              24th street      NYC             
 4                P              8th street SE  Chicago           
 5                T              10 Riverside Ave Boston
 6                                20th St          NYC          

Each student can have multiple address types:

R stands for "Residential",P for "Permanent" ,E for "Emergency",T for "Temporary" and addr_type can also be blank

I want to populate "IsPrimaryAddress" columns based on the following logic:

If for particular student if address_type R exists then "Yes" should be written in front of address_type "R" in the IsPrimaryAddress column and "No" should be written in front of other address types for that particular student_id.

if address_type R doesn't exist but P exists then IsPrimaryAddress='Yes' for 'P' and 'No' for rest of the types

if neither P or R exists,but E exists then IsPrimaryAddress='Yes' for 'E' if P,R or E don't exist,but 'T' exists then IsPrimaryAddress='Yes' for 'T' Resultant dataframe would look like this:

student_id     address_type     Address          City      IsPrimaryAddress
 1                R              6th street      MPLS              Yes
 1                P              10th street SE  Chicago           No
 1                E              10th street SE  Chicago           No
 2                P              Washington ST   Boston            Yes
 2                E              Essex St        NYC               No
 3                E              1040 Taft Blvd  Dallas            Yes
 4                R              24th street      NYC             Yes
 4                P              8th street SE  Chicago           No
 5                T              10 Riverside Ave Boston          Yes
 6                                20th St          NYC           Yes

How can I achieve this?I tried rank and cumcount functions on address_type but couldn't get them work.

Upvotes: 1

Views: 124

Answers (1)

BENY
BENY

Reputation: 323226

First using Categorical make the address_type can be sort customized

df.address_type=pd.Categorical(df.address_type,['R','P','E','T',''],ordered=True)

df=df.sort_values('address_type') # the sort the values
df['new']=(df.groupby('student_id').address_type.transform('first')==df.address_type).map({True:'Yes',False:'No'}) # since we sorted the value , so the first value of each group is the one we need to mark as Yes
df=df.sort_index() # sort the index order back to the original df


   student_id address_type  new
0           1            R  Yes
1           1            P   No
2           1            E   No
3           2            P  Yes
4           2            E   No
5           3            E  Yes
6           4            R  Yes
7           4            P   No
8           5            T  Yes
9           6               Yes

Upvotes: 1

Related Questions