Piyush Singh
Piyush Singh

Reputation: 23

Retrieving the columns based on a condition in dataframe

Input Dataframe:

 id typeofAddress       city     state
  1          Home      Kolkata     WB
  2          Home     Columbus     OH
  2        Office    SanFrancisco  CA
  4        Office    Chicago       IL
  4        Home      Chicago       IL
  4        Hotel     Chicago       IL
  4        Other     Chicago       IL
  5
  6        Home                    NY
  7        Office    Edison        NJ
  8        Office    Chicago       IL
  8        Home          
  9        Home
  9        Office
  10       Hotel     Chicago       IL
   

Expected Output:

  id typeofAddress       city     state
  1          Home      Kolkata     WB
  2          Home     Columbus     OH
  4          Home     Chicago      IL
  5
  6          Home                  NY
  7        Office    Edison        NJ
  8        Office    Chicago       IL    
  9
  10   

Actual Output

 id typeofAddress       city     state
  1          Home      Kolkata     WB
  2          Home     Columbus     OH
  4          Home     Chicago      IL
  5
  6          Home                  NY
  7        Office    Edison        NJ
  8        Home    Chicago       IL    
  9        Home    
  10       Hotel

Explanation :: for id 10 since it is neither Home or Office this row should be picked without anything in typeOfaddress or city or state. The condition is pick the row if typeOfaddress == Home and city not empty otherwise for that id check if typeofaddrees == Office and city not empty otherwise for that id it would be empty .for any other type of address it should be empty

Upvotes: 0

Views: 73

Answers (2)

Corralien
Corralien

Reputation: 120391

Create a rank column according your priorities:

condlist = [df['typeofAddress'].eq('Home') & df['city'].ne(''),
            df['typeofAddress'].eq('Office') & df['city'].ne('')]
rank = np.select(condlist, choicelist=[1, 2], default=3)

out = df.assign(rank=rank).sort_values('rank') \
        .groupby('id').first() \
        .drop(columns='rank').reset_index()

Output:

>>> out
   id typeofAddress          city state
0   1          Home       Kolkata    WB
1   2          Home      Columbus    OH
2   3        Office  SanFrancisco    CA

Upvotes: 1

SergioCK
SergioCK

Reputation: 11

You can solve this using boolean masks on your DataFrame. You can google "boolean masking with pandas" for more details.

import pandas as pd
d={'typeofAddress':['Home','Office','Home','Home','Office'],'city':['Kolkata','Calcutta','Columbus','','SanFrancisco'],'state':['WB','WB','OH','','CA']}
df=pd.DataFrame(d)
output=df[((df['typeofAddress']=='Home')&(df['city']!=''))|(df['typeofAddress']=='Office')]
output

Upvotes: 1

Related Questions