Serdia
Serdia

Reputation: 4418

how to sum up columns from different dataframes into a single dataframe in pandas

Sample data

import pandas as pd

df1 = pd.DataFrame() 
df1["Col1"] = [0,2,4,6,2] 
df1["Col2"] = [5,1,3,4,0]
df1["Col3"] = [8,0,5,1,7]
df1["Col4"] = [1,4,6,0,8]
#df1_new = df1.iloc[:, 1:3]

df2 = pd.DataFrame() 
df2["Col1"] = [8,2,4,6,2,3,5] 
df2["Col2"] = [3,7,3,4,0,6,8]
df2["Col3"] = [5,0,5,1,7,9,1]
df2["Col4"] = [0,4,6,0,8,6,0]
#df2_new = df1.iloc[:, 1:3]

dataframes = [df1, df2]

for df in dataframes:
    df_new=df.iloc[:, 1:3]
    print(df_new.sum(axis=0))

result from above looks like this:

Col2    13  
Col3    21  
dtype: int64
Col2    31  
Col3    28  
dtype: int64

But how can I sum up both dataframes and put it into a single one?

Result should look like this:

enter image description here

Real example looks like this:

xlsx_files = glob.glob(os.path.join(path, "*.xlsx"))
#print(csv_files)

# loop over the list of csv files
for f in xlsx_files: 
    # create df from each excel file
    dfs = pd.read_excel(f)
    # grab file name to user it in summarized df
    file_name =  f.split("\\")[-1]
    new_df = pd.concat([dfs]).iloc[:,13:28].sum()

Upvotes: 1

Views: 90

Answers (4)

Nick
Nick

Reputation: 147146

You can either sum the dataframes separately and then add the results, or sum the concatenated dataframes:

df1.iloc[:,1:3].sum() + df2.iloc[:,1:3].sum()

pd.concat([df1,df2]).iloc[:,1:3].sum()

In both cases the result is

Col2    44
Col3    49
dtype: int64

You can convert the result from a series to a DataFrame and transpose using

.to_frame().T

to get this output:

   Col2  Col3
0    44    49

For the code in your updated question, you probably want something like this:

xlsx_files = glob.glob(os.path.join(path, "*.xlsx"))
#print(csv_files)

# loop over the list of csv files
new_df = pd.DataFrame()
for f in xlsx_files: 
    # create df from each excel file
    dfs = pd.read_excel(f)
    # grab file name to user it in summarized df
    file_name =  f.split("\\")[-1]
    new_df = pd.concat([new_df, dfs])

result = new_df.iloc[:,13:28].sum()

Upvotes: 1

constantstranger
constantstranger

Reputation: 9379

Here is one way:

long, short = (df1, df2) if len(df1.index) > len(df2.index) else (df2, df1)
print((short[["Col2", "Col3"]].reindex(long.index, fill_value=0) + long[["Col2", "Col3"]]).sum().to_frame().T)

Or, if you need to use iloc for the columns, here is another way:

long, short = (df1, df2) if len(df1.index) > len(df2.index) else (df2, df1)
print((short.iloc[:, 1:3].reindex(long.index, fill_value=0) + long.iloc[:, 1:3]).sum().to_frame().T)

Output (same for both):

   Col2  Col3
0    44    49

Upvotes: 0

BrokenBenchmark
BrokenBenchmark

Reputation: 19223

Get the columnwise sums of both dataframes, take the middle two columns of each, and add them together. Then, transpose the result to turn the rows into columns:

pd.DataFrame((df1.iloc[:, 1:3].sum() + df2.iloc[:, 1:3].sum())).T

This outputs:

   Col2  Col3
0    44    49

Upvotes: 0

Naveed
Naveed

Reputation: 11650

here is another way about it

combining the sum of the individual sum of the DFs, converting result to a DF and then choosing Col2 and Col3 after Transposing

(df1.sum() + df2.sum()).to_frame().T[['Col2','Col3']]
    Col2    Col3
0   44      49

Upvotes: 0

Related Questions