Reputation: 23898
I want to translate the following R
code using tidytable into collapse: Advanced and Fast Data Transformation.
tidytable Code
library(tidytable)
library(collapse)
Out1 <-
wlddev %>%
mutate_rowwise.(New1 = sum(c_across.(PCGDP:GINI), na.rm = TRUE))
Out1 %>%
select.(New1)
# A tidytable: 13,176 x 1
New1
<dbl>
1 32.4
2 33.0
3 33.5
4 34.0
5 34.5
6 34.9
7 35.4
8 35.9
9 36.4
10 36.9
# ... with 13,166 more rows
collapse Code
library(collapse)
Out2 <-
wlddev %>%
ftransform(New1 = fsum(across(PCGDP:GINI), na.rm = TRUE))
Error in `context_peek()`:
! `across()` must only be used inside dplyr verbs.
Run `rlang::last_error()` to see where the error occurred.
Any hint please.
Upvotes: 0
Views: 202
Reputation: 1369
I wonder why you need to come up with something so complex. You have functions like rowSums
in base R, and you have parallel statistical functions in kit
:
library(collapse)
library(magrittr)
library(kit, include.only = "psum")
library(microbenchmark)
microbenchmark(
A = wlddev %>%
ftransform(New1 = rowSums(qM(slt(., PCGDP:GINI)), na.rm = TRUE)),
B = wlddev %>%
ftransform(New1 = psum(slt(., PCGDP:GINI), na.rm = TRUE)),
C = wlddev %>%
ftransform(New1 = psum(PCGDP, LIFEEX, GINI, na.rm = TRUE))
)
#> Unit: microseconds
#> expr min lq mean median uq max neval
#> A 68.88 97.8875 194.24037 102.2335 113.8775 4646.366 100
#> B 25.83 30.1350 35.43548 34.9115 38.6630 56.416 100
#> C 22.55 25.8095 29.99396 30.5860 32.9025 53.792 100
Created on 2022-02-05 by the reprex package (v2.0.1)
Upvotes: 4
Reputation: 23898
Taking lead from @akrun answer, I came up with a solution with more speed.
Out3 <-
wlddev %>%
slt(PCGDP:GINI) %>%
qDT() %>%
t %>%
fsum(.) %>%
ftransform(.data = wlddev, New1 = .) %>%
qDT() %>%
replace_NA(X = ., value = 0, cols = "New1")
Speed Comparison
library(microbenchmark)
microbenchmark(
Out1 =
wlddev %>%
mutate_rowwise.(New1 = sum(c_across.(PCGDP:GINI), na.rm = TRUE))
, Out2 =
wlddev %>%
slt(PCGDP:GINI) %>%
t %>%
as_tibble %>%
fsum(.) %>%
ftransform(wlddev, New1 = .)
, Out3 =
wlddev %>%
slt(PCGDP:GINI) %>%
qDT() %>%
t %>%
fsum(.) %>%
ftransform(.data = wlddev, New1 = .) %>%
qDT() %>%
replace_NA(X = ., value = 0, cols = "New1")
)
Unit: microseconds
expr min lq mean median uq max neval
Out1 72618.0 78268.75 81296.992 79888.50 81671.10 162397.8 100
Out2 33549.7 35520.75 37763.537 37728.25 39021.90 55001.3 100
Out3 241.2 310.85 360.225 357.40 387.35 780.1 100
Upvotes: 0
Reputation: 887213
The ?fsum
from collapse
does columnwise sum
fsum is a generic function that computes the (column-wise) sum of all values in x, (optionally) grouped by g and/or weighted by w (e.g. to calculate survey totals).
Based on the tidytable
code, it is rowwise
, so one option is to select (slt
) the columns of interest, t
ranspose, convert to tibble/data.frame
and use fsum
and create a new column
library(collapse)
Out2 <- wlddev %>%
slt(PCGDP:GINI) %>%
t %>%
as_tibble %>%
fsum(.) %>%
ftransform(wlddev, New1 = .)
sum
returns 0 when all the elements are NA
whereas fsum
by default uses na.rm = TRUE
and it returns NA if all the elements are NA
> fsum(c(NA, NA))
[1] NA
> sum(c(NA, NA), na.rm = TRUE)
[1] 0
Therefore, if we change the NA
to 0 in the second data, the output will be the same as OP's 'Out1'
> Out2$New1[is.na(Out2$New1)] <- 0
> all.equal(Out1, Out2, check.attributes = FALSE)
[1] TRUE
Upvotes: 2