Reputation: 23
Input Dataframe:
id typeofAddress city state
1 Home Kolkata WB
2 Home Columbus OH
2 Office SanFrancisco CA
4 Office Chicago IL
4 Home Chicago IL
4 Hotel Chicago IL
4 Other Chicago IL
5
6 Home NY
7 Office Edison NJ
8 Office Chicago IL
8 Home
9 Home
9 Office
10 Hotel Chicago IL
Expected Output:
id typeofAddress city state
1 Home Kolkata WB
2 Home Columbus OH
4 Home Chicago IL
5
6 Home NY
7 Office Edison NJ
8 Office Chicago IL
9
10
Actual Output
id typeofAddress city state
1 Home Kolkata WB
2 Home Columbus OH
4 Home Chicago IL
5
6 Home NY
7 Office Edison NJ
8 Home Chicago IL
9 Home
10 Hotel
Explanation :: for id 10 since it is neither Home or Office this row should be picked without anything in typeOfaddress or city or state. The condition is pick the row if typeOfaddress == Home and city not empty otherwise for that id check if typeofaddrees == Office and city not empty otherwise for that id it would be empty .for any other type of address it should be empty
Upvotes: 0
Views: 73
Reputation: 120391
Create a rank column according your priorities:
condlist = [df['typeofAddress'].eq('Home') & df['city'].ne(''),
df['typeofAddress'].eq('Office') & df['city'].ne('')]
rank = np.select(condlist, choicelist=[1, 2], default=3)
out = df.assign(rank=rank).sort_values('rank') \
.groupby('id').first() \
.drop(columns='rank').reset_index()
Output:
>>> out
id typeofAddress city state
0 1 Home Kolkata WB
1 2 Home Columbus OH
2 3 Office SanFrancisco CA
Upvotes: 1
Reputation: 11
You can solve this using boolean masks on your DataFrame. You can google "boolean masking with pandas" for more details.
import pandas as pd
d={'typeofAddress':['Home','Office','Home','Home','Office'],'city':['Kolkata','Calcutta','Columbus','','SanFrancisco'],'state':['WB','WB','OH','','CA']}
df=pd.DataFrame(d)
output=df[((df['typeofAddress']=='Home')&(df['city']!=''))|(df['typeofAddress']=='Office')]
output
Upvotes: 1