Jaffer Wilson
Jaffer Wilson

Reputation: 7273

Columns appending is troublesome with Pandas

Here is what I have tried and what error I received:

>>> import pandas as pd
>>> df = pd.DataFrame({"A":[1,2,3,4,5],"B":[5,4,3,2,1],"C":[0,0,0,0,0],"D":[1,1,1,1,1]})
>>> df
   A  B  C  D
0  1  5  0  1
1  2  4  0  1
2  3  3  0  1
3  4  2  0  1
4  5  1  0  1
>>> import pandas as pd
>>> df = pd.DataFrame({"A":[1,2,3,4,5],"B":[5,4,3,2,1],"C":[0,0,0,0,0],"D":[1,1,1,1,1]})
>>> first = [2,2,2,2,2,2,2,2,2,2,2,2]
>>> first = pd.DataFrame(first).T
>>> first.index = [2]
>>> df = df.join(first)
>>> df
   A  B  C  D    0    1    2    3    4    5    6    7    8    9   10   11
0  1  5  0  1  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN
1  2  4  0  1  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN
2  3  3  0  1  2.0  2.0  2.0  2.0  2.0  2.0  2.0  2.0  2.0  2.0  2.0  2.0
3  4  2  0  1  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN
4  5  1  0  1  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN
>>> second = [3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3]
>>> second = pd.DataFrame(second).T
>>> second.index = [1]
>>> df = df.join(second)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Python35\lib\site-packages\pandas\core\frame.py", line 6815, in join
    rsuffix=rsuffix, sort=sort)
  File "C:\Python35\lib\site-packages\pandas\core\frame.py", line 6830, in _join_compat
    suffixes=(lsuffix, rsuffix), sort=sort)
  File "C:\Python35\lib\site-packages\pandas\core\reshape\merge.py", line 48, in merge
    return op.get_result()
  File "C:\Python35\lib\site-packages\pandas\core\reshape\merge.py", line 552, in get_result
    rdata.items, rsuf)
  File "C:\Python35\lib\site-packages\pandas\core\internals\managers.py", line 1972, in items_overlap_with_suffix
    '{rename}'.format(rename=to_rename))
ValueError: columns overlap but no suffix specified: Index([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11], dtype='object')

I am trying to create new list with the extra columns which I have to add at specific indexes of the main dataframe df.
When i tried the first it worked and you can see the output. But when I tried the same way with second I received the above mentioned error.

Kindly, let me know what I can do in this situation and achieve the goal I am expecting.

Upvotes: 1

Views: 42

Answers (3)

jezrael
jezrael

Reputation: 862671

Use DataFrame.combine_first instead join if need assign to same columns created before, last DataFrame.reindex by list of columns for expected ordering:

df = pd.DataFrame({"A":[1,2,3,4,5],"B":[5,4,3,2,1],"C":[0,0,0,0,0],"D":[1,1,1,1,1]})
orig = df.columns.tolist()

first = [2,2,2,2,2,2,2,2,2,2,2,2]
first = pd.DataFrame(first).T
first.index = [2]
df = df.combine_first(first)

second = [3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3]
second = pd.DataFrame(second).T
second.index = [1]
df = df.combine_first(second)

df = df.reindex(orig + first.columns.tolist(), axis=1)
print (df)
   A  B  C  D    0    1    2    3    4    5    6    7    8    9   10   11
0  1  5  0  1  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN
1  2  4  0  1  3.0  3.0  3.0  3.0  3.0  3.0  3.0  3.0  3.0  3.0  3.0  3.0
2  3  3  0  1  2.0  2.0  2.0  2.0  2.0  2.0  2.0  2.0  2.0  2.0  2.0  2.0
3  4  2  0  1  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN
4  5  1  0  1  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN

Upvotes: 5

Erfan
Erfan

Reputation: 42916

You have to specify the suffixes since the column names are the same. Assuming you are trying to add the second values as new columns horizontally:

df = df.join(second, lsuffix='first', rsuffix='second')

   A  B  C  D  0first  1first  2first  3first  4first  5first  ...  10second  11second   12   13   14   15   16   17   18   19
0  1  5  0  1     NaN     NaN     NaN     NaN     NaN     NaN  ...       NaN       NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN
1  2  4  0  1     NaN     NaN     NaN     NaN     NaN     NaN  ...       3.0       3.0  3.0  3.0  3.0  3.0  3.0  3.0  3.0  3.0
2  3  3  0  1     2.0     2.0     2.0     2.0     2.0     2.0  ...       NaN       NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN
3  4  2  0  1     NaN     NaN     NaN     NaN     NaN     NaN  ...       NaN       NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN
4  5  1  0  1     NaN     NaN     NaN     NaN     NaN     NaN  ...       NaN       NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN

Upvotes: 3

qmeeus
qmeeus

Reputation: 2402

Yes this is expected behaviour because join works much like an SQL join, meaning that it will join on the provided index and concatenate all the columns together. The problem arises from the fact that pandas does not accept two columns to have the same name. Hence, if you have 2 columns in each dataframe with the same name, it will first look for a suffix to add to those columns to avoid name clashes. This is controlled with the lsuffix and rsuffix arguments in the join method.

Conclusion: 2 ways to solve this:

  • Either provide a suffix so that pandas is able to resolve the name clashes; or
  • Make sure that you don't have overlapping columns

Upvotes: 3

Related Questions