Reputation: 45281
What is the best way provided by pandas to combine many dataframes, and perform mathematical addition of the values at the overlapping rows and columns?
I have a long list of overlapping dataframes that look like this (this is showing only 3 of them):
>>> df1
xcode ycode
n
17 1 0
18 1 0
19 1 0
20 1 0
21 1 0
389 1 0
390 1 0
391 1 0
392 1 0
393 1 0
394 1 0
>>> df2
xcode ycode
n
58 1 0
59 1 0
60 1 0
61 1 0
62 1 0
610 1 0
611 1 0
612 1 0
613 1 0
614 1 0
615 1 0
>>> df3
xcode ycode
n
21 0 1
22 0 1
23 0 1
24 0 1
25 0 1
26 0 1
27 0 1
28 0 1
29 0 1
30 0 1
31 0 1
32 0 1
33 0 1
34 0 1
35 0 1
36 0 1
37 0 1
38 0 1
39 0 1
40 0 1
41 0 1
42 0 1
43 0 1
44 0 1
45 0 1
46 0 1
47 0 1
48 0 1
49 0 1
50 0 1
51 0 1
52 0 1
53 0 1
54 0 1
55 0 1
56 0 1
57 0 1
58 0 1
I can combine these, adding together the values at the overlapping rows and columns, in the following way, but it is a long bit of code:
>>> pd.DataFrame().add(df1, fill_value=0).add(df2, fill_value=0).add(df3, fill_value=0)
xcode ycode
n
17 1 0
18 1 0
19 1 0
20 1 0
21 1 1 # Note the overlapping index
389 1 0
390 1 0
391 1 0
392 1 0
393 1 0
394 1 0
58 1 1 # Note the overlapping index
59 1 0
60 1 0
61 1 0
62 1 0
610 1 0
611 1 0
612 1 0
613 1 0
614 1 0
615 1 0
22 0 1
23 0 1
24 0 1
25 0 1
26 0 1
27 0 1
28 0 1
29 0 1
30 0 1
31 0 1
32 0 1
33 0 1
34 0 1
35 0 1
36 0 1
37 0 1
38 0 1
39 0 1
40 0 1
41 0 1
42 0 1
43 0 1
44 0 1
45 0 1
46 0 1
47 0 1
48 0 1
49 0 1
50 0 1
51 0 1
52 0 1
53 0 1
54 0 1
55 0 1
56 0 1
57 0 1
I could do the same thing using a loop, or reduce
. But does the pandas library provide a better way to do this?
(I searched, but could not find a similar question; other questions either have identical indexes, or only a couple of dataframes being added.)
Upvotes: 1
Views: 176
Reputation: 25259
Use pd.concat
and sum
with level=0
.
>>> pd.concat([df1, df2, df3]).sum(level=0)
xcode ycode
n
17 1 0
18 1 0
19 1 0
20 1 0
21 1 1
389 1 0
390 1 0
391 1 0
392 1 0
393 1 0
394 1 0
58 1 1
59 1 0
60 1 0
61 1 0
62 1 0
610 1 0
611 1 0
612 1 0
613 1 0
614 1 0
615 1 0
22 0 1
23 0 1
24 0 1
25 0 1
26 0 1
27 0 1
28 0 1
29 0 1
30 0 1
31 0 1
32 0 1
33 0 1
34 0 1
35 0 1
36 0 1
37 0 1
38 0 1
39 0 1
40 0 1
41 0 1
42 0 1
43 0 1
44 0 1
45 0 1
46 0 1
47 0 1
48 0 1
49 0 1
50 0 1
51 0 1
52 0 1
53 0 1
54 0 1
55 0 1
56 0 1
57 0 1
Upvotes: 2
Reputation: 2894
You should be able to use standard python operators:
sum([df1, df2, df3])
or
df1 + df2 + df3
Upvotes: -1