Akash Warkhade
Akash Warkhade

Reputation: 69

Vertically merge dataframes

I have a query regarding merging 4 dataframes For example i have 4 dataframes as below :

print(df1)

SET I      Violations
Rule 1            1
Rule 2            1
Rule 3            6

print(df2)

SET II       Violations
Rule 1            2
Rule 2            3
Rule 3            6

print(df3)

SET III       Violations
Rule 1            2
Rule 2            4
Rule 3            8


print(df4)

SET IV       Violations
Rule 1            2
Rule 2            5
Rule 3            8

My expected output :

SET I      Violations
Rule 1            1
Rule 2            1
Rule 3            6


SET II       Violations
Rule 1            2
Rule 2            3
Rule 3            6

SET III       Violations
Rule 1            2
Rule 2            4
Rule 3            8

SET IV       Violations
Rule 1            2
Rule 2            5
Rule 3            8

Outputs i am getting right now :

a)

SET I   SET II    SET III   SET IV  Violations
Rule 1                                     1
Rule 2                                     1
Rule 3                                     6 
        Rule 1                             2
        Rule 2                             3
        Rule 3                             6
         Rule 1                    2
                 Rule 2                    4
                 Rule 3                    8
                       Rule 1          2
                           Rule 2          5
                           Rule 3          8

command_used:

pandas.concat([df1,df2,df3,df4],axis=0,ignore_index=True)

b)

Rule 1  1   Rule 1    2    Rule 1  2      Rule 1  2 
Rule 2  1   Rule 2    3    Rule 2  4      Rule 2  5
Rule 3  6   Rule 3    6    Rule 3  8      Rule 3  8

command_used:

pandas.concat([df1,df2,df3,df4],axis=1,ignore_index=True)

Please help me with above

Upvotes: 1

Views: 112

Answers (2)

Chris Adams
Chris Adams

Reputation: 18647

If you need the column headings as row values within the final DataFrame and a blank row between sets, then you can try transposing, reset_index, then transposing back then DataFrame.append before using pandas.concat:

s = pd.Series([''] * df1.shape[1], name='blank')

df_new = pd.concat([df.T.reset_index().T.append(s) for df in [df1, df2, df3, df4]], ignore_index=True)
df_new.columns = ['SET', 'VIOLATIONS']
print(df_new)

        SET  VIOLATIONS
0     SET I  Violations
1    Rule 1           1
2    Rule 2           1
3    Rule 3           6
4                      
5    SET II  Violations
6    Rule 1           2
7    Rule 2           3
8    Rule 3           6
9                      
10  SET III  Violations
11   Rule 1           2
12   Rule 2           4
13   Rule 3           8
14                     
15   SET IV  Violations
16   Rule 1           2
17   Rule 2           5
18   Rule 3           8
19                     

Otherwise use:

df_new = pd.concat([df.set_axis(range(len(s)), axis=1, inplace=False).append(s)
                    for df in [df1, df2, df3, df4]], ignore_index=True)
df_new.columns = ['SET', 'VIOLATIONS']
print(df_new)

       SET VIOLATIONS
0   Rule 1          1
1   Rule 2          1
2   Rule 3          6
3                    
4   Rule 1          2
5   Rule 2          3
6   Rule 3          6
7                    
8   Rule 1          2
9   Rule 2          4
10  Rule 3          8
11                   
12  Rule 1          2
13  Rule 2          5
14  Rule 3          8
15                   

Edit

To output to an excel workbook without the index, use DataFrame.to_excel:

df_new.to_excel('./workbookname.xls', index=False)

     SET  VIOLATIONS
       0           1
  Rule 1           1
  Rule 2           1
  Rule 3           6

  SET II  Violations
  Rule 1           2
  Rule 2           3
  Rule 3           6

 SET III  Violations
  Rule 1           2
  Rule 2           4
  Rule 3           8

  SET IV  Violations
  Rule 1           2
  Rule 2           5
  Rule 3           8

Edit 2:

To add sum of violations, list comprehension is not longer viable option. This time we'll use a for loop, append results to a list, and concat them at the end:

df_new = []

for df in [df1, df2, df3, df4]:
    total = pd.Series(['Total', df.iloc[:, 1].sum()], name='sum')
    blank = pd.Series(['', ''], name='blank')
    df_new.append(df.T.reset_index().T.append(total).append(blank))

df_new = pd.concat(df_new, ignore_index=True).rename(columns={0:'SET', 1:'VIOLATION'})
df_new.to_excel('./workbookname.xls', index=False)

Upvotes: 4

Erfan
Erfan

Reputation: 42916

The reason you get this output, is the column names of each SET column is different. You have to harmonize these names before using pd.concat. Plus in pd.concat you are using the wrong axis.

dfs = [df1, df2, df3, df4]

for d in dfs:
    d.columns = [col[:3] if 'SET' in col else col for col in d.columns]

df_all = pd.concat(dfs, ignore_index=True)

Which yields:

print(df_all)
       SET  Violations
0   Rule 1           1
1   Rule 2           1
2   Rule 3           6
3   Rule 1           2
4   Rule 2           3
5   Rule 3           6
6   Rule 1           2
7   Rule 2           4
8   Rule 3           8
9   Rule 1           2
10  Rule 2           5
11  Rule 3           8

Upvotes: 1

Related Questions