Reputation: 241
I have two tables: sales table & product table and these two tables share the 'PART NUMBER' column. The 'PART NUMBER' column in the sales table is not unique, but it is unique in the product table. (see image below of a snapshot of the sales table & product table)
I was trying to add the equivalent 'Description' to each 'PART NUMBER' on the sales table, and I followed the examples from the pandas website my code
sales.join(part_table, on='PART NUMBER')
But I got this error:
ValueError: columns overlap but no suffix specified: Index([u'PART NUMBER'], dtype='object')
Can someone explain what this error means and how to solve it?
Many thanks!
Upvotes: 7
Views: 31126
Reputation: 375405
I think you want to do a merge rather than a join:
sales.merge(part_table)
Here's an example dataframe:
In [11]: dfa = pd.DataFrame([[1, 2], [3, 4]], columns=['A', 'B'])
In [12]: dfb = pd.DataFrame([[1, 'a'], [3, 'b'], [3, 'c']], columns=['A', 'C'])
In [13]: dfa.join(dfb, on=['A'])
ValueError: columns overlap but no suffix specified: Index([u'A'], dtype='object')
In [14]: dfa.merge(dfb)
Out[14]:
A B C
0 1 2 a
1 3 4 b
2 3 4 c
It's unclear from the docs if this is intentational (I thought that on
would be used as the column) but following the exceptions message if you add suffixs we can see what's going on:
In [21]: dfb.join(dfa, on=['A'], lsuffix='_a', rsuffix='_b')
Out[21]:
A_a C A_b B
0 1 a 3 4
1 3 b NaN NaN
2 3 c NaN NaN
In [22]: dfb.join(dfa, lsuffix='_a', rsuffix='_b')
Out[22]:
A_a C A_b B
0 1 a 1 2
1 3 b 3 4
2 3 c NaN NaN
It's ignoring the on kwarg and just doing the join.
Upvotes: 13