kakoli
kakoli

Reputation: 447

Filtering dataframe with Multiindex in Pandas

First dataframe d1: Index columns are State and City

                      **2008q2         2009q2          Price Ratio**
**State   City**
Alabama   Montevallo  119000.000000  125200.000000     0.950479
          Tuscaloosa  139366.666667  136933.333333     1.017770
Alaska    Fairbanks   254733.333333  225833.333333     1.127970

Second dataframe d2 :

    State    City
0  Alabama    Auburn
1  Alabama    Florence
2  Alabama    Tuscaloosa  

I need to filter out/remove rows in d1 where [State,City] is not in d2.

Upvotes: 1

Views: 69

Answers (2)

kakoli
kakoli

Reputation: 447

Did the following steps :

  1. outer join between d1 and d2 with Indicator=True
  2. extract the left_only entries from the outer join to get the rows_in_df1_not_in_df2.

                      **2008q2         2009q2          Price Ratio**
    

    State City Alabama Montevallo 119000.000000 125200.000000 0.950479 Alaska Fairbanks 254733.333333 225833.333333 1.127970

Would like to know if there is any other way like boolean masking?

Upvotes: 0

jezrael
jezrael

Reputation: 863701

You can use join with inner join:

df = df2.join(df1, on=['State','City'], how='inner')
print (df)
     State        City         2008q2         2009q2  Price Ratio
2  Alabama  Tuscaloosa  139366.666667  136933.333333      1.01777

Another solution with merge, parameter on can be omit if only intersection of same columns in both df:

df = pd.merge(df1.reset_index(), df2)
#df = pd.merge(df1.reset_index(), df2, on=['State','City'])
print (df)
     State        City         2008q2         2009q2  Price Ratio
0  Alabama  Tuscaloosa  139366.666667  136933.333333      1.01777

Upvotes: 1

Related Questions