Marc-Olivier Gasser
Marc-Olivier Gasser

Reputation: 33

How to implement a function using multiple columns in dplyr

This question was previously asked: How to round numbers of a data frame in R preserving the sum?

I also want to implement this function in dplyr that rounds and preserves unity depending on number of digits wanted:

round_preserve_sum <- function(x, digits = 0) {
  up <- 10 ^ digits
  x <- x * up
  y <- floor(x)
  indices <- tail(order(x-y), round(sum(x)) - sum(y))
  y[indices] <- y[indices] + 1
  y / up
}

Here is a dataframe:

df <- data.frame(SAND = c(0.00000, 28.00000, 27.27273),
                 SILT = c(45.45455, 35.00000, 34.34343),
                 CLAY = c(54.54545, 37.00000, 38.38384))

Using this function with these values separately, I get:

round_preserve_sum(c(0.00000, 45.45455, 54.54545), 0)

[1] 0 45 55

 round_preserve_sum(c(28.00000, 35.00000, 37.00000), 0)

[1] 28 35 37

 round_preserve_sum(c(27.27273, 34.34343, 38.38384), 0)

[1] 27 34 39

which all sum to a 100

When I implement this function in dplyr :

df.Rd0 <-df %>% 
  mutate(across(c(SAND, SILT, CLAY), ~round_preserve_sum(.,0)),
         Sum = SAND + SILT + CLAY)

I get :

   SAND SILT CLAY  Sum 
1    0   46   55   101 
2   28   35   37   100 
3   27   34   38    99

Not using tilde:

df.Rd0 <-df %>% 
  mutate(across(c(SAND, SILT, CLAY), round_preserve_sum(.,0)),
         Sum = SAND + SILT + CLAY)

I get this error message:

Error : Problem with `mutate()` input `..1`.
i `..1 = across(c(SAND, SILT, CLAY), round_preserve_sum(., 0))`.
x undefined columns selected

I guess the function is not programmed for vectors?

Upvotes: 3

Views: 102

Answers (1)

akrun
akrun

Reputation: 886948

The ~ is a lambda expression i.e. short form for function(.x). If we don't use it, then specify the format parameters as named one

library(dplyr)
df %>% 
  mutate(across(c(SAND, SILT, CLAY), round_preserve_sum, digits = 0),
         Sum = SAND + SILT + CLAY)

-output

  SAND SILT CLAY Sum
1    0   46   55 101
2   28   35   37 100
3   27   34   38  99

Regarding the OP's manual use of getting sum as 100, it was rowwise, and not column wise - across loops over columns. We need rowwise with c_across

df %>%
    rowwise %>% 
    mutate(Sum = sum(round_preserve_sum(c_across(everything()), 0))) %>%
    ungroup

-ouptut

# A tibble: 3 x 4
   SAND  SILT  CLAY   Sum
  <dbl> <dbl> <dbl> <dbl>
1   0    45.5  54.5   100
2  28    35    37     100
3  27.3  34.3  38.4   100

If we want to return the rounded columns along with sum, use pmap

library(purrr)
df %>%
    pmap_dfr(~ {tmp <- round_preserve_sum(c(...), 0)
      c(tmp, Sum = sum(tmp))})
# A tibble: 3 x 4
   SAND  SILT  CLAY   Sum
  <dbl> <dbl> <dbl> <dbl>
1     0    45    55   100
2    28    35    37   100
3    27    34    39   100

THis can be made faster with dapply from collapse

library(collapse)
df <- dapply(df, MARGIN = 1, FUN = round_preserve_sum, 0)
df$Sum <- rowSums(df, na.rm = TRUE)

Upvotes: 3

Related Questions