Bong Kyo Seo
Bong Kyo Seo

Reputation: 391

Trying to concat two dataframes on top of each other but having some troubles with index values and length mismatch

I have two dataframes that I want to concatenate on top of each other. The DFs look like:

DF1 (3 columns, many rows in multiples of 3 duplicates)

    col1   col2   col3
0   A1     A2     A3
0   A1     A2     A3
0   A1     A2     A3
1   A4     A5     A6
1   A4     A5     A6
1   A4     A5     A6

DF2 (6 columns that are in multiples of 2 column headers and an empty col, many rows)

    col4  col5 'empty column' col4 col5 'empty column'
0    1    2                   3    4     
1    5    6                   7    8

Here are the steps that I've taken:

1) Transpose DF1 by:

DF1 = DF1.reset_index().transpose()

I was hoping that transposing will match the 3 duplicates to the col4, col5 and empty column of DF2

2) concat the two DFs:

DF3 = pd.concat([DF1, DF2], ignore_index=True)
DF3.to_excel('list.xlsx', header=False, index=False)

However, this will simply output DF1 with index also transposed even with reset_index:

0   0   0   1   1   1
A1  A1  A1  A4  A4  A4
A2  A2  A2  A5  A5  A5
A3  A3  A3  A6  A6  A6

I am sure I did the transpose and/or concat wrong but I cannot pinpoint where. Am I missing something?

Edit)

The expected output should be:

A1    A1    A1    A4    A4    A4
A2    A2    A2    A5    A5    A5
A3    A3    A3    A6    A6    A6
col4  col5        col4  col5
1     2           3     4
5     6           7     8

Upvotes: 1

Views: 427

Answers (1)

jezrael
jezrael

Reputation: 863116

Idea is create same columns values in each DataFrame:

df1 = DF1.reset_index(drop=True).T
df2 = DF2.copy()
df2.columns = np.arange(len(df2.columns))

df = pd.concat([df1, df2], ignore_index=True)
print (df)
    0   1    2   3   4    5
0  A1  A1   A1  A4  A4   A4
1  A2  A2   A2  A5  A5   A5
2  A3  A3   A3  A6  A6   A6
3   1   2  NaN   3   4  NaN
4   5   6  NaN   7   8  NaN

If need columns from DF2 to row between:

df1 = DF1.reset_index(drop=True).T
df2 = DF2.copy()
df2.columns = np.arange(len(df2.columns))

#convert columns to one row DataFrame
df3 = DF2.columns.to_frame().T
df3.columns = np.arange(len(df3.columns))

df = pd.concat([df1, df3, df2], ignore_index=True)
print (df)
      0     1    2     3     4    5
0    A1    A1   A1    A4    A4   A4
1    A2    A2   A2    A5    A5   A5
2    A3    A3   A3    A6    A6   A6
3  col4  col5       col4  col5     
4     1     2  NaN     3     4  NaN
5     5     6  NaN     7     8  NaN

If need columns from DF2 in top solution is set columns of df1 by DF2 columns:

df1 = DF1.reset_index(drop=True).T
df1.columns = DF2.columns

df = pd.concat([df1, DF2], ignore_index=True)
print (df)
  col4 col5      col4 col5     
0   A1   A1   A1   A4   A4   A4
1   A2   A2   A2   A5   A5   A5
2   A3   A3   A3   A6   A6   A6
3    1    2  NaN    3    4  NaN
4    5    6  NaN    7    8  NaN

Disadvantage of solution is get duplicated columns names, so if seelct one column get all columns with dulicated columns names:

print (df['col4'])
  col4 col4
0   A1   A4
1   A2   A5
2   A3   A6
3    1    3
4    5    7

Upvotes: 2

Related Questions