Reputation: 4298
I want to iterate through data.table
, just as purrr::map
does. While I was able to apply data.table
functions by converting data.frame
to data.table
inside purrr::map
, I want to know whether data.table
has something built-in that would abnegate using purrr::map
. I am asking this because I am unsure of purrr::map
's performance in terms of both speed and memory required. I have been disappointed with dplyr
's speed and memory utilization, as compared to data.table
when dealing with large datasets.
I researched stackoverflow and found that accepted answer on Iterate through data tables thread has used for
loop. I am not a big fan of for
loop for performance reasons.
Here's sample data file:
dput(Input_File)
structure(list(Zone = c("East", "East", "East", "East", "East",
"East", "East", "West", "West", "West", "West", "West", "West",
"West"), Fiscal.Year = c(2016, 2016, 2016, 2016, 2016, 2016,
2017, 2016, 2016, 2016, 2017, 2017, 2018, 2018), Transaction.ID = c(132,
133, 134, 135, 136, 137, 171, 171, 172, 173, 175, 176, 177, 178
), L.Rev = c(3, 0, 0, 1, 0, 0, 2, 1, 1, 2, 2, 1, 2, 1), L.Qty = c(3,
0, 0, 1, 0, 0, 1, 1, 1, 2, 2, 1, 2, 1), A.Rev = c(0, 0, 0, 1,
1, 1, 0, 0, 0, 0, 0, 1, 0, 0), A.Qty = c(0, 0, 0, 2, 2, 3, 0,
0, 0, 0, 0, 3, 0, 0), I.Rev = c(4, 4, 4, 0, 1, 0, 3, 0, 0, 0,
1, 0, 1, 1), I.Qty = c(2, 2, 2, 0, 1, 0, 3, 0, 0, 0, 1, 0, 1,
1)), .Names = c("Zone", "Fiscal.Year", "Transaction.ID", "L.Rev",
"L.Qty", "A.Rev", "A.Qty", "I.Rev", "I.Qty"), row.names = c(NA,
14L), class = "data.frame")
Here's sample code with purrr::map
and data.table
UZone <- unique(Input_File$Zone)
FYear <- unique(Input_File$Fiscal.Year)
a<-purrr::map(UZone, ~ dplyr::filter(Input_File, Zone == .)) %>%
purrr::map(~ data.table::as.data.table(.)) %>%
purrr::map(~ .[,.(sum = sum(L.Rev)),by=Fiscal.Year])
I am not too concerned about the output, but I want to know what alternatives are available to iterate through data.table
based on a specific column. I'd appreciate any thoughts.
Upvotes: 5
Views: 3932
Reputation: 42592
I'm not sure what's behind the question but I do prefer
library(data.table)
setDT(Input_File)[, .(sum = sum(L.Rev)), by = .(Zone, Fiscal.Year)]
Zone Fiscal.Year sum 1: East 2016 4 2: East 2017 2 3: West 2016 4 4: West 2017 3 5: West 2018 3
over OP's approach returning a
as
[[1]] Fiscal.Year sum 1: 2016 4 2: 2017 2 [[2]] Fiscal.Year sum 1: 2016 4 2: 2017 3 3: 2018 3
Upvotes: 1
Reputation: 3055
Piping data tables can be very well done by repeating []
, e.g. DT[][][]
. For lists, I think there's no alternatives to magrittr
. The rest can be completed by chaining lapply
library(data.table)
library(magrittr)
Input_File <- data.table(Input_File)
UZone <- unique(Input_File$Zone)
FYear <- unique(Input_File$Fiscal.Year)
lapply(UZone, function(x) Input_File[Zone==x]) %>%
lapply(function(x) x[,.(sum=sum(L.Rev)), by=Fiscal.Year])
In case you want to iterate over columns, you may want to have a look at this solution
UPDATE: I guess there could be a cleaner solution without importing magrittr
and without $
subsetting
library(data.table)
Input_File <- data.table(Input_File)
by_zone_lst <- lapply(Input_File[,unique(Zone)], function(x) Input_File[Zone==x])
summary_lst <- lapply(by_zone_lst, function(y) y[,.(sum=sum(L.Rev)), by=Fiscal.Year])
summary_lst
Upvotes: 2