RustyShackleford
RustyShackleford

Reputation: 3677

How to prevent pd.concat from inserting `.1` after columns with the same name?

I am trying to merge 2 datasets together where column names overlap.

for example like this:

df1:

col1   col2
aa     aa
bb     bb

df2:
col2   col3
cc     dd

new_df = pd.concat([df1,df2],axis=1)

new_df:

    col1   col2    col3
    aa     aa
    bb     bb
           cc     dd

When I run the above line in my code I get something like this:

  col1   col2   col2.1   col3
    aa     aa     nan
    bb     bb     nan
           cc     nan     dd

How do I prevent the .1 from appearing and force pd.concat to match the column names and insert data?

Upvotes: 1

Views: 357

Answers (1)

willeM_ Van Onsem
willeM_ Van Onsem

Reputation: 477308

You concatenated along the wrong axis. here you used the column axis, whereas you want to concatenate over the index axis:

>>> pd.concat([df1, df2], axis='rows')
  col1 col2 col3
0   aa   aa  NaN
1   bb   bb  NaN
0  NaN   cc   dd

So by either specifying axis=0, axis='rows', axis='index', or omitting it totally, the columns are "grouped", and you concatenate "vertically".

Upvotes: 3

Related Questions