Reputation: 2057
I can't get "empty" inner joins to work with a MultiIndex. Under 0.10.1, I have:
d1 = pd.DataFrame({
'i1': [1, 2, 2],
'i2': [1, 1, 2],
'a': [10,20,30]}).set_index(['i1', 'i2'])
d2 = pd.DataFrame({
'i1': [3, 3],
'i2': [1, 2],
'b': [40, 50]}).set_index(['i1', 'i2'])
d1.join(d2, how='inner')
which gives me
Exception: Cannot infer number of levels from empty list
Is there any good way around this? I'd like to be able to tell in advance if the intersection is empty, so I can avoid the exception.
Upvotes: 7
Views: 1930
Reputation: 13910
Got his problem in for loop by merging multi index nodes from HDFstore chunk by chunk. Solved it ugly like this, might be helpful for someone else later on.
import pandas as pd
d1 = pd.DataFrame({
'i1': [1, 2, 2],
'i2': [1, 1, 2],
'a': [10,20,30]}).set_index(['i1', 'i2'])
d2 = pd.DataFrame({
'i1': [3, 3],
'i2': [1, 2],
'b': [40, 50]}).set_index(['i1', 'i2'])
for x in y:
try:
d3 = d1.join(d2, how='inner')
except Exception:
print "no merge possible between rows, but let's continue"
d3 = d1.join(d2, how='outer').dropna()
if len(d3)
print "there's a merge"
#action
print "fail, but still in the race"
Upvotes: 1
Reputation: 68186
I'm not 100% on this, but doing an outer join and dropping the NAs is the same as an inner join. So in the case of no matching indicies, you just get an empty dataframe. If we modify your example to include one matching record, this appears to be the case:
import pandas as pd
d1 = pd.DataFrame({
'i1': [1, 2, 2],
'i2': [1, 1, 2],
'a': [10,20,30]}).set_index(['i1', 'i2'])
d2 = pd.DataFrame({
'i1': [1, 3],
'i2': [1, 2],
'b': [40, 50]}).set_index(['i1', 'i2'])
d3 = d1.join(d2, how='outer').dropna()
d4 = d1.join(d2, how='inner')
Which gives:
In [9]: d3
Out[9]:
a b
i1 i2
1 1 10 40
In [10]: d4
Out[10]:
a b
i1 i2
1 1 10 40
And so after the outer join + dropna()
, you can see how many rows d3
and go from there. Using your original example:
import pandas as pd
d1 = pd.DataFrame({
'i1': [1, 2, 2],
'i2': [1, 1, 2],
'a': [10,20,30]}).set_index(['i1', 'i2'])
d2 = pd.DataFrame({
'i1': [3, 3],
'i2': [1, 2],
'b': [40, 50]}).set_index(['i1', 'i2'])
d3 = d1.join(d2, how='outer').dropna()
print(d3.shape) # no error, shows "(0, 2)"
Upvotes: 2
Reputation: 880339
It is easier to ask forgiveness than permission:
import pandas as pd
d1 = pd.DataFrame({'i1': [1, 2, 2], 'i2': [1, 1, 2], 'a': [10, 20, 30]}
).set_index(['i1', 'i2'])
d2 = pd.DataFrame(
{'i1': [3, 3], 'i2': [1, 2], 'b': [40, 50]}).set_index(['i1', 'i2'])
try:
d1.join(d2, how='inner')
except Exception as err:
# Change this to however you wish to handle this case.
print(err)
Upvotes: 5