Reputation: 69
I have a query regarding merging 4 dataframes For example i have 4 dataframes as below :
print(df1)
SET I Violations
Rule 1 1
Rule 2 1
Rule 3 6
print(df2)
SET II Violations
Rule 1 2
Rule 2 3
Rule 3 6
print(df3)
SET III Violations
Rule 1 2
Rule 2 4
Rule 3 8
print(df4)
SET IV Violations
Rule 1 2
Rule 2 5
Rule 3 8
My expected output :
SET I Violations
Rule 1 1
Rule 2 1
Rule 3 6
SET II Violations
Rule 1 2
Rule 2 3
Rule 3 6
SET III Violations
Rule 1 2
Rule 2 4
Rule 3 8
SET IV Violations
Rule 1 2
Rule 2 5
Rule 3 8
Outputs i am getting right now :
a)
SET I SET II SET III SET IV Violations
Rule 1 1
Rule 2 1
Rule 3 6
Rule 1 2
Rule 2 3
Rule 3 6
Rule 1 2
Rule 2 4
Rule 3 8
Rule 1 2
Rule 2 5
Rule 3 8
command_used:
pandas.concat([df1,df2,df3,df4],axis=0,ignore_index=True)
b)
Rule 1 1 Rule 1 2 Rule 1 2 Rule 1 2
Rule 2 1 Rule 2 3 Rule 2 4 Rule 2 5
Rule 3 6 Rule 3 6 Rule 3 8 Rule 3 8
command_used:
pandas.concat([df1,df2,df3,df4],axis=1,ignore_index=True)
Please help me with above
Upvotes: 1
Views: 112
Reputation: 18647
If you need the column headings as row values within the final DataFrame
and a blank row between sets, then you can try transposing, reset_index
, then transposing back then DataFrame.append
before using pandas.concat
:
s = pd.Series([''] * df1.shape[1], name='blank')
df_new = pd.concat([df.T.reset_index().T.append(s) for df in [df1, df2, df3, df4]], ignore_index=True)
df_new.columns = ['SET', 'VIOLATIONS']
print(df_new)
SET VIOLATIONS
0 SET I Violations
1 Rule 1 1
2 Rule 2 1
3 Rule 3 6
4
5 SET II Violations
6 Rule 1 2
7 Rule 2 3
8 Rule 3 6
9
10 SET III Violations
11 Rule 1 2
12 Rule 2 4
13 Rule 3 8
14
15 SET IV Violations
16 Rule 1 2
17 Rule 2 5
18 Rule 3 8
19
Otherwise use:
df_new = pd.concat([df.set_axis(range(len(s)), axis=1, inplace=False).append(s)
for df in [df1, df2, df3, df4]], ignore_index=True)
df_new.columns = ['SET', 'VIOLATIONS']
print(df_new)
SET VIOLATIONS
0 Rule 1 1
1 Rule 2 1
2 Rule 3 6
3
4 Rule 1 2
5 Rule 2 3
6 Rule 3 6
7
8 Rule 1 2
9 Rule 2 4
10 Rule 3 8
11
12 Rule 1 2
13 Rule 2 5
14 Rule 3 8
15
To output to an excel workbook without the index, use DataFrame.to_excel
:
df_new.to_excel('./workbookname.xls', index=False)
SET VIOLATIONS
0 1
Rule 1 1
Rule 2 1
Rule 3 6
SET II Violations
Rule 1 2
Rule 2 3
Rule 3 6
SET III Violations
Rule 1 2
Rule 2 4
Rule 3 8
SET IV Violations
Rule 1 2
Rule 2 5
Rule 3 8
To add sum
of violations, list comprehension is not longer viable option. This time we'll use a for
loop, append results to a list, and concat
them at the end:
df_new = []
for df in [df1, df2, df3, df4]:
total = pd.Series(['Total', df.iloc[:, 1].sum()], name='sum')
blank = pd.Series(['', ''], name='blank')
df_new.append(df.T.reset_index().T.append(total).append(blank))
df_new = pd.concat(df_new, ignore_index=True).rename(columns={0:'SET', 1:'VIOLATION'})
df_new.to_excel('./workbookname.xls', index=False)
Upvotes: 4
Reputation: 42916
The reason you get this output, is the column names of each SET
column is different. You have to harmonize these names before using pd.concat
. Plus in pd.concat
you are using the wrong axis
.
dfs = [df1, df2, df3, df4]
for d in dfs:
d.columns = [col[:3] if 'SET' in col else col for col in d.columns]
df_all = pd.concat(dfs, ignore_index=True)
Which yields:
print(df_all)
SET Violations
0 Rule 1 1
1 Rule 2 1
2 Rule 3 6
3 Rule 1 2
4 Rule 2 3
5 Rule 3 6
6 Rule 1 2
7 Rule 2 4
8 Rule 3 8
9 Rule 1 2
10 Rule 2 5
11 Rule 3 8
Upvotes: 1