Caledonian26
Caledonian26

Reputation: 809

Problem with using inner-join to merge two dataframes

I have the following code below:

universitytowns = pd.merge(houses,unitowns,how='inner',on=['State','RegionName'])

However, my output is:

Empty DataFrame
Columns: [State, RegionName, 2000q1, 2000q2, 2000q3, 2000q4, 2001q1, 2001q2, 2001q3, 2001q4, 2002q1, 2002q2, 2002q3, 2002q4, 2003q1, 2003q2, 2003q3, 2003q4, 2004q1, 2004q2, 2004q3, 2004q4, 2005q1, 2005q2, 2005q3, 2005q4, 2006q1, 2006q2, 2006q3, 2006q4, 2007q1, 2007q2, 2007q3, 2007q4, 2008q1, 2008q2, 2008q3, 2008q4, 2009q1, 2009q2, 2009q3, 2009q4, 2010q1, 2010q2, 2010q3, 2010q4, 2011q1, 2011q2, 2011q3, 2011q4, 2012q1, 2012q2, 2012q3, 2012q4, 2013q1, 2013q2, 2013q3, 2013q4, 2014q1, 2014q2, 2014q3, 2014q4, 2015q1, 2015q2, 2015q3, 2015q4, 2016q1, 2016q2, 2016q3, difference]
Index: []

[0 rows x 70 columns]

I am confused as to why this is...

The dataframe houses looks like this:

State    RegionName         2000q1         2000q2         2000q3  \
0      New York      New York            NaN            NaN            NaN   
1    California   Los Angeles  207066.666667  214466.666667  220966.666667   
2      Illinois       Chicago  138400.000000  143633.333333  147866.666667   
3  Pennsylvania  Philadelphia   53000.000000   53633.333333   54133.333333   
4       Arizona       Phoenix  111833.333333  114366.666667  116000.000000   

          2000q4         2001q1         2001q2         2001q3         2001q4  \
0            NaN            NaN            NaN            NaN            NaN   
1  226166.666667  233000.000000  239100.000000  245066.666667  253033.333333   
2  152133.333333  156933.333333  161800.000000  166400.000000  170433.333333   
3   54700.000000   55333.333333   55533.333333   56266.666667   57533.333333   
4  117400.000000  119600.000000  121566.666667  122700.000000  124300.000000   

       ...              2014q3         2014q4         2015q1         2015q2  \
0      ...       522800.000000  528066.666667  532266.666667  540800.000000   
1      ...       509066.666667  518866.666667  528800.000000  538166.666667   
2      ...       195766.666667  201266.666667  201066.666667  206033.333333   
3      ...       115300.000000  115666.666667  116200.000000  117966.666667   
4      ...       165366.666667  168500.000000  171533.333333  174166.666667   

          2015q3         2015q4         2016q1         2016q2    2016q3  \
0  557200.000000  572833.333333  582866.666667  591633.333333  587200.0   
1  547266.666667  557733.333333  566033.333333  577466.666667  584050.0   
2  208300.000000  207900.000000  206066.666667  208200.000000  212000.0   
3  121233.333333  122200.000000  123433.333333  126933.333333  128700.0   
4  179066.666667  183833.333333  187900.000000  191433.333333  195200.0   

     difference  
0  33933.333333  
1  55600.000000  
2  12300.000000  
3    766.666667  
4  25533.333333  

The dataframe unitowns looks like this:

[5 rows x 70 columns]
     State     RegionName
0  Alabama        Auburn 
1  Alabama      Florence 
2  Alabama  Jacksonville 
3  Alabama    Livingston 
4  Alabama    Montevallo 

Would anybody be able to give me a helping hand?

Upvotes: 0

Views: 42

Answers (1)

NYC Coder
NYC Coder

Reputation: 7614

It's working for me. Are you sure the 2 dataframes have a common state and RegionName. I just modified the unitowns dataframe to include New York and New York and it worked, or there might be some extra characters, space, etc.

universitytowns = pd.merge(houses,unitowns,how='inner',on=['State','RegionName'])
print(universitytowns)

      State RegionName  2000q1  2000q2  2000q3
0  New York   New York     NaN     NaN     NaN

Upvotes: 1

Related Questions