user2725109
user2725109

Reputation: 2386

Bug in pandas.DataFrame.merge?

The following:

q = pd.DataFrame([[1,2],[3,4]])
r = pd.DataFrame([[1,2],[5,6]], columns=['a','b'])
pd.merge(q, r, left_on=q.columns, right_on=r.columns, how='left')

raises an error:

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

The following doesn't:

q = pd.DataFrame([[1,2],[3,4]])
r = pd.DataFrame([[1,2],[5,6]], columns=['a','b'])
pd.merge(q, r, left_on=q.columns.tolist(), right_on=r.columns.tolist(), how='left')

Is this a bug?

Upvotes: 3

Views: 1394

Answers (1)

Dennis Golomazov
Dennis Golomazov

Reputation: 17359

It depends on what is considered array-like in Pandas. It might also be a bug in documentation.

Pandas checks the type of left_on and right_on parameters (see _maybe_make_list function in pandas source), and since they are both not tuple/lists (namely, q.columns is RangeIndex and r.columns is Index), it basically does:

[q.columns] == [r.columns]

instead of comparing them directly, so that outputs the error.

Documentation says left_on: label or list, or array-like. I couldn't find a definition of array-like in Pandas, but in this case it seems to be limited to tuple or list.

Upvotes: 3

Related Questions