Group by columns while merging data frames

Question

I am merging several data frames into one data frame in a for loop. something like this:

import pandas as pd
import numpy as np

df1 = pd.DataFrame(data=np.random.randint(0,100,(2,5)),columns=list('ABCDE'))
df2 = pd.DataFrame(data=np.random.randint(0,100,(2,5)),columns=list('GHABC'))
df1 = df1.merge(df2, how='outer', left_index=True, right_index=True,suffixes=('', '_' + 'second'))

Several columns from each of these data frames have similar names so each time there is such a column I am adding a suffix. It is becoming a mess. In addition, I would like to be able to quickly access all the columns from the first table, second one, etc. Is there a way to merge these columns but keep them as a group? So that I don't need to change the column name and I can access all the columns of each data set easier?

Georgy · Accepted Answer

Here is a way to merge the dataframes horizonthally with the help of MultiIndex, which has some advantages comparing to vertical merge. For example, you won't have lots of NaN fields, and the dtype's won't change from int to float like in the case with horizonthal merge.

import numpy as np
import pandas as pd


df1 = pd.DataFrame(data=np.random.randint(0, 100, (2, 5)),
                   columns=list('ABCDE'))
df2 = pd.DataFrame(data=np.random.randint(0, 100, (2, 5)),
                   columns=list('GHABC'))
dfs = [df1, df2]
result = pd.concat(dfs, axis=1, keys=range(len(dfs)))
print(result)

This will give:

    0                   1                
    A   B   C   D   E   G   H   A   B   C
0  41  49  13  36  57  28  12  82  18  67
1  72  91  34  17  12   6  67  98  36  25

You can access each group in a loop:

for source_index, df in result.groupby(axis=1, level=0):
    print(df)

    0                
    A   B   C   D   E
0  41  49  13  36  57
1  72  91  34  17  12
    1                
    G   H   A   B   C
0  28  12  82  18  67
1   6  67  98  36  25

or individually:

gb = result.groupby(axis=1, level=0)
first_group = gb.get_group(0)
print(first_group)

    0                
    A   B   C   D   E
0  41  49  13  36  57
1  72  91  34  17  12

References:

Group by columns while merging data frames

Answers (2)

Related Questions