Pandas CONCAT() with merged columns in Creation

Question

I am trying to create a very large dataframe, made up of one column from many smaller dataframes (renamed to the dataframe name). I am using CONCAT() and looping through dictionary values which represent dataframes, and looping over index values, to create the large dataframe. The CONCAT() join_axes is the common index to all the dataframes. This works fine, however I then have duplicate column names.
I must be able to loop over the indexes at specifc windows as part of my final dataframe creation - so removing this step isnt an option

For example, this results in the following final dataframe with duplciate columns:

Is there any way I can use CONCAT() excatly as I am, but merge the columns to produce an output like so?:

jezrael · Accepted Answer

I think you need:

df = pd.concat([df1, df2])

Or if have duplicates in columns use groupby where if some values are overlapping then are summed:

print (df.groupby(level=0, axis=1).sum())

Sample:

df1 = pd.DataFrame({'A':[5,8,7, np.nan],
                   'B':[1,np.nan,np.nan,9],
                   'C':[7,3,np.nan,0]})

df2 = pd.DataFrame({'A':[np.nan,np.nan,np.nan,2],
                   'B':[1,2,np.nan,np.nan],
                   'C':[np.nan,6,np.nan,3]})
print (df1)
     A    B    C
0  5.0  1.0  7.0
1  8.0  NaN  3.0
2  7.0  NaN  NaN
3  NaN  9.0  0.0

print (df2)
     A    B    C
0  NaN  1.0  NaN
1  NaN  2.0  6.0
2  NaN  NaN  NaN
3  2.0  NaN  3.0

df = pd.concat([df1, df2],axis=1)
print (df)
     A    B    C    A    B    C
0  5.0  1.0  7.0  NaN  1.0  NaN
1  8.0  NaN  3.0  NaN  2.0  6.0
2  7.0  NaN  NaN  NaN  NaN  NaN
3  NaN  9.0  0.0  2.0  NaN  3.0

print (df.groupby(level=0, axis=1).sum())
     A    B    C
0  5.0  2.0  7.0
1  8.0  2.0  9.0
2  7.0  NaN  NaN
3  2.0  9.0  3.0

Pandas CONCAT() with merged columns in Creation

Answers (2)

Related Questions