Al Merchant
Al Merchant

Reputation: 107

Python Pandas DataFrames

I simply want to add data frames that are stored in a dictionary. intuitively I would want to loop the dictionary. However I do not have an initial dataframe with zero values. What is the best way to accomplish that elegantly. Currently I am doing the following:

dict = {'B' :df1, 'C':df2, 'D': df3}

total = dict['B'] + dict['C'] + dict['D']

the dfs are initialized from reading from a csv file and there could be more than 3.

How can I accomplish this in a loop?

Upvotes: 0

Views: 67

Answers (4)

Al Merchant
Al Merchant

Reputation: 107

just for completeness, here is what demonstrates the problem and the solution:

import pandas as pd
import numpy as np

df1 = pd.DataFrame(np.random.rand(3,2))
df2 = pd.DataFrame(np.random.rand(3,2))
df3 = pd.DataFrame(np.random.rand(3,2))
df4 = pd.DataFrame(np.random.rand(3,2))

d = {'a': df1, 'b': df2, 'c': df3, 'd': df4}

for key, df in d.items():
    if 'total' in locals():
        print("found")
        total += df
    else:
        print("not")
        total = df

print(total)
del total

Upvotes: 0

albert
albert

Reputation: 8623

Assuming you want to add (and not concatenate as shown in another answer) these DataFrames you could use something like the following:

#!/usr/bin/env python3
# coding: utf-8

import pandas as pd
import numpy as np

df1 = pd.DataFrame(np.random.rand(3,2))
df2 = pd.DataFrame(np.random.rand(3,2))
df3 = pd.DataFrame(np.random.rand(3,2))
df4 = pd.DataFrame(np.random.rand(3,2))

d = {'a': df1, 'b': df2, 'c': df3, 'd': df4}
total = 0    

for key, df in d.items():
    total += df

Upvotes: 1

JoeCondron
JoeCondron

Reputation: 8906

You could create a panel and then sum:

pd.Panel(dict).sum()

On a side note, it's not best practice to overwrite the inbuilt dict function

Upvotes: 0

EdChum
EdChum

Reputation: 394459

You can pass the dict values to concat, example:

In [195]:
d = {}
d['a'] = pd.DataFrame({'a':np.arange(5)})
d['b'] = pd.DataFrame({'b':np.arange(5)})
total = pd.concat(d.values(), axis=1)
total.sum()

Out[195]:
a    10
b    10
dtype: int64

Upvotes: 1

Related Questions