Reputation: 15
I have a dataset of 2.5M entries. Trying to do something fairly basic on that data just adding all rows of the same time. I have tried SUM and dcast but I'm not sure it is doing what I want.
date,time,available,free,total
d1,t1,1,2,3
d1,t1,2,3,4
d1,t1,3,4,5
d1,d2,2,3,4
d1,d3,4,3,2
d2,t1,2,3,4
...
I want to sum all the available, free and total figures for each d&t distinct date and time
Tried things like info <- dcast(availwet, scrappedDate + scrappedTime ~ ., sum)
but that only works if I change the data to have one of available, free or total in the df.
Upvotes: 1
Views: 8225
Reputation: 887851
For a big dataset, I would use either data.table
or dplyr
methods.
Using data.table
, we convert the 'data.frame' to 'data.table' (setDT(df1)
), grouped by 'date' and 'time' column, we loop through the remaining column with lapply
and get the sum
library(data.table)
setDT(df1)[,lapply(.SD, sum) ,.(date, time)]
Or a similar option using dplyr
, we group with the grouping columns and use summarise_each
to get the sum
of the remaining columns.
library(dplyr)
df1 %>%
group_by(date, time) %>%
summarise_each(funs(sum))
Or the formula
method with aggregate
(a base R
option suggested by @David Arenburg). We place the grouping columns on the RHS of ~
and on the LHS we can simply use .
to signify the rest of the columns. It is a convenient function though it might be slower for big datasets.
aggregate(.~ date + time, df1, FUN= sum)
Upvotes: 3