Reputation: 340
I have a following dataframe df_address containing addresses of students
student_id address_type Address City
1 R 6th street MPLS
1 P 10th street SE Chicago
1 E 10th street SE Chicago
2 P Washington ST Boston
2 E Essex St NYC
3 E 1040 Taft Blvd Dallas
4 R 24th street NYC
4 P 8th street SE Chicago
5 T 10 Riverside Ave Boston
6 20th St NYC
Each student can have multiple address types:
R stands for "Residential",P for "Permanent" ,E for "Emergency",T for "Temporary" and addr_type can also be blank
I want to populate "IsPrimaryAddress" columns based on the following logic:
If for particular student if address_type R exists then "Yes" should be written in front of address_type "R" in the IsPrimaryAddress column and "No" should be written in front of other address types for that particular student_id.
if address_type R doesn't exist but P exists then IsPrimaryAddress='Yes' for 'P' and 'No' for rest of the types
if neither P or R exists,but E exists then IsPrimaryAddress='Yes' for 'E' if P,R or E don't exist,but 'T' exists then IsPrimaryAddress='Yes' for 'T' Resultant dataframe would look like this:
student_id address_type Address City IsPrimaryAddress
1 R 6th street MPLS Yes
1 P 10th street SE Chicago No
1 E 10th street SE Chicago No
2 P Washington ST Boston Yes
2 E Essex St NYC No
3 E 1040 Taft Blvd Dallas Yes
4 R 24th street NYC Yes
4 P 8th street SE Chicago No
5 T 10 Riverside Ave Boston Yes
6 20th St NYC Yes
How can I achieve this?I tried rank and cumcount functions on address_type but couldn't get them work.
Upvotes: 1
Views: 124
Reputation: 323226
First using Categorical
make the address_type can be sort customized
df.address_type=pd.Categorical(df.address_type,['R','P','E','T',''],ordered=True)
df=df.sort_values('address_type') # the sort the values
df['new']=(df.groupby('student_id').address_type.transform('first')==df.address_type).map({True:'Yes',False:'No'}) # since we sorted the value , so the first value of each group is the one we need to mark as Yes
df=df.sort_index() # sort the index order back to the original df
student_id address_type new
0 1 R Yes
1 1 P No
2 1 E No
3 2 P Yes
4 2 E No
5 3 E Yes
6 4 R Yes
7 4 P No
8 5 T Yes
9 6 Yes
Upvotes: 1