Reputation: 173
I have list of pandas dataframes with two columns, basically class and value:
df1:
Name | Count |
---|---|
Bob | 10 |
John | 20 |
df2:
Name | Count |
---|---|
Mike | 30 |
Bob | 40 |
There might be same "Names" in different dataframes, there might be no same "Names", and list contains over 100 dataframes. But in each dataframe all "Names" are unique.
What I need is to iterate over all dataframes and create one big one, where presented all values from "Names" and their total sums of "Count" from all the dataframes, so like:
result:
Name | Count |
---|---|
Bob | 50 |
John | 20 |
Mike | 30 |
Bob's data is summed, others are not, as they are only present once. Is there efficient way once there are many dataframes?
Upvotes: 1
Views: 812
Reputation: 5745
you can do the following (assuming you have ,more data that only conatined in one dataframe use fill_value=0
to still provide value..:
df1.set_index('Name').add(df2.set_index('Name'), fill_value=0).reset_index()
>>> Name Count
0 Bob 50.0
1 John 20.0
2 Mike 30.0
Upvotes: 2
Reputation: 645
do pd.concat
then groupby
:
df = pd.concat(dfs) # where dfs is a list of dataframes
then you can do
gp = df.groupby(['Name'])['Count'].sum()
Upvotes: 3