Rick
Rick

Reputation: 45281

Add many pandas dataframes with overlapping indexes and columns

What is the best way provided by pandas to combine many dataframes, and perform mathematical addition of the values at the overlapping rows and columns?

I have a long list of overlapping dataframes that look like this (this is showing only 3 of them):

>>> df1
     xcode  ycode
n
17       1      0
18       1      0
19       1      0
20       1      0
21       1      0
389      1      0
390      1      0
391      1      0
392      1      0
393      1      0
394      1      0
>>> df2
     xcode  ycode
n
58       1      0
59       1      0
60       1      0
61       1      0
62       1      0
610      1      0
611      1      0
612      1      0
613      1      0
614      1      0
615      1      0
>>> df3
    xcode  ycode
n
21      0      1
22      0      1
23      0      1
24      0      1
25      0      1
26      0      1
27      0      1
28      0      1
29      0      1
30      0      1
31      0      1
32      0      1
33      0      1
34      0      1
35      0      1
36      0      1
37      0      1
38      0      1
39      0      1
40      0      1
41      0      1
42      0      1
43      0      1
44      0      1
45      0      1
46      0      1
47      0      1
48      0      1
49      0      1
50      0      1
51      0      1
52      0      1
53      0      1
54      0      1
55      0      1
56      0      1
57      0      1
58      0      1

I can combine these, adding together the values at the overlapping rows and columns, in the following way, but it is a long bit of code:

>>> pd.DataFrame().add(df1, fill_value=0).add(df2, fill_value=0).add(df3, fill_value=0)
     xcode  ycode
n
17       1      0
18       1      0
19       1      0
20       1      0
21       1      1    # Note the overlapping index
389      1      0
390      1      0
391      1      0
392      1      0
393      1      0
394      1      0
58       1      1    # Note the overlapping index
59       1      0
60       1      0
61       1      0
62       1      0
610      1      0
611      1      0
612      1      0
613      1      0
614      1      0
615      1      0
22       0      1
23       0      1
24       0      1
25       0      1
26       0      1
27       0      1
28       0      1
29       0      1
30       0      1
31       0      1
32       0      1
33       0      1
34       0      1
35       0      1
36       0      1
37       0      1
38       0      1
39       0      1
40       0      1
41       0      1
42       0      1
43       0      1
44       0      1
45       0      1
46       0      1
47       0      1
48       0      1
49       0      1
50       0      1
51       0      1
52       0      1
53       0      1
54       0      1
55       0      1
56       0      1
57       0      1

I could do the same thing using a loop, or reduce. But does the pandas library provide a better way to do this?

(I searched, but could not find a similar question; other questions either have identical indexes, or only a couple of dataframes being added.)

Upvotes: 1

Views: 176

Answers (2)

Andy L.
Andy L.

Reputation: 25259

Use pd.concat and sum with level=0.

>>> pd.concat([df1, df2, df3]).sum(level=0)    
     xcode  ycode
n
17       1      0
18       1      0
19       1      0
20       1      0
21       1      1
389      1      0
390      1      0
391      1      0
392      1      0
393      1      0
394      1      0
58       1      1
59       1      0
60       1      0
61       1      0
62       1      0
610      1      0
611      1      0
612      1      0
613      1      0
614      1      0
615      1      0
22       0      1
23       0      1
24       0      1
25       0      1
26       0      1
27       0      1
28       0      1
29       0      1
30       0      1
31       0      1
32       0      1
33       0      1
34       0      1
35       0      1
36       0      1
37       0      1
38       0      1
39       0      1
40       0      1
41       0      1
42       0      1
43       0      1
44       0      1
45       0      1
46       0      1
47       0      1
48       0      1
49       0      1
50       0      1
51       0      1
52       0      1
53       0      1
54       0      1
55       0      1
56       0      1
57       0      1

Upvotes: 2

Kyle
Kyle

Reputation: 2894

You should be able to use standard python operators:

sum([df1, df2, df3])

or

df1 + df2 + df3

Upvotes: -1

Related Questions