Reputation: 607
Others have asked similar questions, but their data structure was a bit different. My data set has multiple columns for both grouping variables and numeric data. I need to sum the numeric data per row and output the sum to a new data column. See example DATA
set and the desired RESULTS
tables below. I prefer to find a solution using the mutate
function in dplyr
. I primary use the dplyr
package to manipulate my data sets. I can accomplish this task via gather
, group_by
, and sumarise
functions in dplyr
, but I am working with very large data sets which can cause “gathered” data tables to be in excess of 2,000,000 rows. Thanks in advance.
DATA = data.frame(SITE = c("A","A","A","A","B","B","B","C","C"),
DATE = c("1","1","2","2","3","3","3","4","4"),
STUFF = c(1, 2, 30, 40, 100, 200, 300, 5000, 6000),
STUFF2 = c(2, 4, 60, 80, 200, 400, 600, 10000, 12000))
RESULT = data.frame(SITE = c("A","A","A","A","B","B","B","C","C"),
DATE = c("1","1","2","2","3","3","3","4","4"),
STUFF = c(1, 2, 30, 40, 100, 200, 300, 5000, 6000),
STUFF2 = c(2, 4, 60, 80, 200, 400, 600, 10000, 12000),
SUM_STUFF = c(3, 6, 90, 120, 300, 600, 900, 15000, 18000))
Upvotes: 4
Views: 5595
Reputation: 3073
This looks like it meets your needs:
RESULT <- DATA %>%
rowwise() %>%
mutate(SUM_STUFF = sum(STUFF,STUFF2))
The key is to use rowwise
(which may have been added to to a version of dplyr
after you asked your question).
> RESULT
Source: local data frame [9 x 5]
Groups: <by row>
SITE DATE STUFF STUFF2 SUM_STUFF
(fctr) (fctr) (dbl) (dbl) (dbl)
1 A 1 1 2 3
2 A 1 2 4 6
3 A 2 30 60 90
4 A 2 40 80 120
5 B 3 100 200 300
6 B 3 200 400 600
7 B 3 300 600 900
8 C 4 5000 10000 15000
9 C 4 6000 12000 18000
Upvotes: 3