Johann Hibschman
Johann Hibschman

Reputation: 2057

Inner join with MultiIndex fails if no overlap

I can't get "empty" inner joins to work with a MultiIndex. Under 0.10.1, I have:

d1 = pd.DataFrame({
    'i1': [1, 2, 2],
    'i2': [1, 1, 2],
     'a': [10,20,30]}).set_index(['i1', 'i2'])
d2 = pd.DataFrame({
    'i1': [3, 3],
    'i2': [1, 2],
    'b': [40, 50]}).set_index(['i1', 'i2'])
d1.join(d2, how='inner')

which gives me

Exception: Cannot infer number of levels from empty list

Is there any good way around this? I'd like to be able to tell in advance if the intersection is empty, so I can avoid the exception.

Upvotes: 7

Views: 1930

Answers (3)

Mattijn
Mattijn

Reputation: 13910

Got his problem in for loop by merging multi index nodes from HDFstore chunk by chunk. Solved it ugly like this, might be helpful for someone else later on.

import pandas as pd
d1 = pd.DataFrame({
    'i1': [1, 2, 2],
    'i2': [1, 1, 2],
     'a': [10,20,30]}).set_index(['i1', 'i2'])
d2 = pd.DataFrame({
    'i1': [3, 3],
    'i2': [1, 2],
    'b': [40, 50]}).set_index(['i1', 'i2'])
for x in y:
    try:
        d3 = d1.join(d2, how='inner')
    except Exception:            
        print "no merge possible between rows, but let's continue"
        d3 = d1.join(d2, how='outer').dropna()
    if len(d3)
        print "there's a merge"
        #action
    print "fail, but still in the race"

Upvotes: 1

Paul H
Paul H

Reputation: 68186

I'm not 100% on this, but doing an outer join and dropping the NAs is the same as an inner join. So in the case of no matching indicies, you just get an empty dataframe. If we modify your example to include one matching record, this appears to be the case:

import pandas as pd
d1 = pd.DataFrame({
    'i1': [1, 2, 2],
    'i2': [1, 1, 2],
    'a': [10,20,30]}).set_index(['i1', 'i2'])
d2 = pd.DataFrame({
    'i1': [1, 3],
    'i2': [1, 2],
    'b': [40, 50]}).set_index(['i1', 'i2'])
d3 = d1.join(d2, how='outer').dropna()
d4 = d1.join(d2, how='inner')

Which gives:

In [9]: d3
Out[9]: 
        a   b
i1 i2        
1  1   10  40

In [10]: d4
Out[10]: 
        a   b
i1 i2        
1  1   10  40

And so after the outer join + dropna(), you can see how many rows d3 and go from there. Using your original example:

import pandas as pd
d1 = pd.DataFrame({
    'i1': [1, 2, 2],
    'i2': [1, 1, 2],
     'a': [10,20,30]}).set_index(['i1', 'i2'])
d2 = pd.DataFrame({
    'i1': [3, 3],
    'i2': [1, 2],
    'b': [40, 50]}).set_index(['i1', 'i2'])
d3 = d1.join(d2, how='outer').dropna()
print(d3.shape) # no error, shows "(0, 2)"

Upvotes: 2

unutbu
unutbu

Reputation: 880339

It is easier to ask forgiveness than permission:

import pandas as pd
d1 = pd.DataFrame({'i1': [1, 2, 2], 'i2': [1, 1, 2], 'a': [10, 20, 30]}
                  ).set_index(['i1', 'i2'])
d2 = pd.DataFrame(
    {'i1': [3, 3], 'i2': [1, 2], 'b': [40, 50]}).set_index(['i1', 'i2'])
try:
    d1.join(d2, how='inner')
except Exception as err:
    # Change this to however you wish to handle this case.
    print(err)

Upvotes: 5

Related Questions