Reputation: 2597
I have a data table with this schema:
date, key_a, key_b, key_c, key_d, value
I'd like to make a list of lists that has this structure:
[1]
date, key_a, value
[2]
date, key_b, value
[3]
date, key_c, value
I'd like to aggregate my dt
into a list where each of the entries is aggregating the dt
across a separate one of the keys.
This is my code
setDT(dt)
list_of_dts[1] <-
dt[, .(value = sum(value)), .(date, key_a)]
list_of_dts[2] <-
dt[, .(value = sum(value)), .(date, key_b)]
(So on)
Is there a more efficient way of solving this?
Upvotes: 2
Views: 245
Reputation: 887251
We can use tidyverse
library(dplyr)
library(tidyr)
dt %>%
pivot_longer(cols = starts_with('key'), values_to = 'value1') %>%
group_by(date, name) %>%
summarise(value = sum(value1))
An option is to convert to disk.frame
and do the group by operations
library(disk.frame)
dt %>%
pivot_longer(cols = starts_with('key'), values_to = 'value1') %>%
as.disk.frame %>%
group_by(date, name) %>%
summarise(value = sum(value1)) %>%
collect()
When there are multiple .csv
files it can be directly read with csv_to_disk.frame
df <- csv_to_disk.frame(file.path(tempdir(), "df.csv"),
inmapfn = function(chunk) {
# convert to date_str to date format and store as "date"
chunk[, date := as.Date(date_str, "%Y-%m-%d")]
chunk[, date_str:= NULL]
chunk[, new := col1 + 5]
})
Upvotes: 1
Reputation: 160447
Try this data.table
-native attempt:
dt <- data.table(date=c(1,1,2), key_a=c(11,11,13), key_b=c(21,21,23), key_c=c(31,31,33), key_d=c(41,41,43), value=c(51,51,53))
keynames <- grep("^key", colnames(dt), value = TRUE)
othnames <- setdiff(colnames(dt), keynames)
keynames
# [1] "key_a" "key_b" "key_c" "key_d"
othnames
# [1] "date" "value"
The split, for raw data:
lapply(setNames(nm = keynames), function(kn) subset(dt, select = c(othnames, kn)))
# $key_a
# date value key_a
# 1: 1 51 11
# 2: 1 51 11
# 3: 2 53 13
# $key_b
# date value key_b
# 1: 1 51 21
# 2: 1 51 21
# 3: 2 53 23
# $key_c
# date value key_c
# 1: 1 51 31
# 2: 1 51 31
# 3: 2 53 33
# $key_d
# date value key_d
# 1: 1 51 41
# 2: 1 51 41
# 3: 2 53 43
Or for your aggregation issue:
lapply(setNames(nm = keynames), function(kn) dt[, .(value = sum(value)), by = c("date", kn)])
# $key_a
# date key_a value
# 1: 1 11 102
# 2: 2 13 53
# $key_b
# date key_b value
# 1: 1 21 102
# 2: 2 23 53
# $key_c
# date key_c value
# 1: 1 31 102
# 2: 2 33 53
# $key_d
# date key_d value
# 1: 1 41 102
# 2: 2 43 53
Upvotes: 1
Reputation: 389055
Maybe get the data in long format and then summarise it :
library(data.table)
setDT(dt)
dt1 <- melt(dt, id.vars = c('date', 'value'))
dt1 <- dt1[, .(value = sum(value)), .(date, variable)]
Now if you want list of dataframes you can use split
:
split(dt1, dt1$variable)
Upvotes: 3