Iano
Iano

Reputation: 15

How to sum multiple rows in R

I have a dataset of 2.5M entries. Trying to do something fairly basic on that data just adding all rows of the same time. I have tried SUM and dcast but I'm not sure it is doing what I want.

date,time,available,free,total
d1,t1,1,2,3
d1,t1,2,3,4
d1,t1,3,4,5
d1,d2,2,3,4
d1,d3,4,3,2
d2,t1,2,3,4
...

I want to sum all the available, free and total figures for each d&t distinct date and time

Tried things like info <- dcast(availwet, scrappedDate + scrappedTime ~ ., sum) but that only works if I change the data to have one of available, free or total in the df.

Upvotes: 1

Views: 8225

Answers (1)

akrun
akrun

Reputation: 887851

For a big dataset, I would use either data.table or dplyr methods.

Using data.table, we convert the 'data.frame' to 'data.table' (setDT(df1)), grouped by 'date' and 'time' column, we loop through the remaining column with lapply and get the sum

library(data.table)
setDT(df1)[,lapply(.SD, sum) ,.(date, time)]

Or a similar option using dplyr, we group with the grouping columns and use summarise_each to get the sum of the remaining columns.

library(dplyr)
df1 %>%
   group_by(date, time) %>%
   summarise_each(funs(sum))

Or the formula method with aggregate (a base R option suggested by @David Arenburg). We place the grouping columns on the RHS of ~ and on the LHS we can simply use . to signify the rest of the columns. It is a convenient function though it might be slower for big datasets.

aggregate(.~ date + time, df1, FUN= sum)

Upvotes: 3

Related Questions