MYaseen208
MYaseen208

Reputation: 23898

collapse: Modifying Columns by row along with combine values from multiple columns

I want to translate the following R code using tidytable into collapse: Advanced and Fast Data Transformation.

tidytable Code

library(tidytable)
library(collapse)
Out1 <- 
  wlddev %>% 
  mutate_rowwise.(New1 = sum(c_across.(PCGDP:GINI), na.rm = TRUE))
Out1 %>% 
  select.(New1)
# A tidytable: 13,176 x 1
    New1
   <dbl>
 1  32.4
 2  33.0
 3  33.5
 4  34.0
 5  34.5
 6  34.9
 7  35.4
 8  35.9
 9  36.4
10  36.9
# ... with 13,166 more rows

collapse Code

library(collapse)
Out2 <- 
  wlddev %>% 
  ftransform(New1 = fsum(across(PCGDP:GINI), na.rm = TRUE))

  Error in `context_peek()`:
  ! `across()` must only be used inside dplyr verbs.
  Run `rlang::last_error()` to see where the error occurred.

Any hint please.

Upvotes: 0

Views: 202

Answers (3)

Sebastian
Sebastian

Reputation: 1369

I wonder why you need to come up with something so complex. You have functions like rowSums in base R, and you have parallel statistical functions in kit:

library(collapse)
library(magrittr)
library(kit, include.only = "psum")  
library(microbenchmark)
  
microbenchmark(
A = wlddev %>%
  ftransform(New1 = rowSums(qM(slt(., PCGDP:GINI)), na.rm = TRUE)),
B = wlddev %>%
  ftransform(New1 = psum(slt(., PCGDP:GINI), na.rm = TRUE)), 
C = wlddev %>%
  ftransform(New1 = psum(PCGDP, LIFEEX, GINI, na.rm = TRUE))
)

#> Unit: microseconds
#>  expr   min      lq      mean   median       uq      max neval
#>     A 68.88 97.8875 194.24037 102.2335 113.8775 4646.366   100
#>     B 25.83 30.1350  35.43548  34.9115  38.6630   56.416   100
#>     C 22.55 25.8095  29.99396  30.5860  32.9025   53.792   100

Created on 2022-02-05 by the reprex package (v2.0.1)

Upvotes: 4

MYaseen208
MYaseen208

Reputation: 23898

Taking lead from @akrun answer, I came up with a solution with more speed.

Out3 <- 
  wlddev %>%
  slt(PCGDP:GINI) %>%
  qDT() %>% 
  t %>%
  fsum(.) %>% 
  ftransform(.data = wlddev, New1 = .) %>%
  qDT() %>% 
  replace_NA(X = ., value = 0, cols = "New1")

Speed Comparison

library(microbenchmark)

microbenchmark(
  Out1 = 
    wlddev %>% 
    mutate_rowwise.(New1 = sum(c_across.(PCGDP:GINI), na.rm = TRUE))
, Out2 =
    wlddev %>%
    slt(PCGDP:GINI) %>%
    t %>%
    as_tibble %>%
    fsum(.) %>% 
    ftransform(wlddev, New1 = .)
, Out3 = 
    wlddev %>%
    slt(PCGDP:GINI) %>%
    qDT() %>% 
    t %>%
    fsum(.) %>% 
    ftransform(.data = wlddev, New1 = .) %>%
    qDT() %>% 
    replace_NA(X = ., value = 0, cols = "New1")
)

Unit: microseconds
 expr     min       lq      mean   median       uq      max neval
 Out1 72618.0 78268.75 81296.992 79888.50 81671.10 162397.8   100
 Out2 33549.7 35520.75 37763.537 37728.25 39021.90  55001.3   100
 Out3   241.2   310.85   360.225   357.40   387.35    780.1   100

Upvotes: 0

akrun
akrun

Reputation: 887213

The ?fsum from collapse does columnwise sum

fsum is a generic function that computes the (column-wise) sum of all values in x, (optionally) grouped by g and/or weighted by w (e.g. to calculate survey totals).

Based on the tidytable code, it is rowwise, so one option is to select (slt) the columns of interest, transpose, convert to tibble/data.frame and use fsum and create a new column

library(collapse)
Out2 <- wlddev %>%
    slt(PCGDP:GINI) %>%
    t %>%
    as_tibble %>%
    fsum(.) %>% 
    ftransform(wlddev, New1 = .) 

sum returns 0 when all the elements are NA whereas fsum by default uses na.rm = TRUE and it returns NA if all the elements are NA

> fsum(c(NA, NA))
[1] NA
> sum(c(NA, NA), na.rm = TRUE)
[1] 0

Therefore, if we change the NA to 0 in the second data, the output will be the same as OP's 'Out1'

> Out2$New1[is.na(Out2$New1)] <- 0
> all.equal(Out1, Out2, check.attributes = FALSE)
[1] TRUE

Upvotes: 2

Related Questions