Bear25
Bear25

Reputation: 33

How do I write a R function that let me manipulate multiple R variables using dplyr's %>% pipes?

I am trying to create a function to manipulate different datasets for but am facing several issues with this task. I am providing a simplified version of the data I am trying to manipulate in the dput() output below:

structure(list(id = structure(c(2, 4, 6, 8, 10), label = "iid", format.spss = "F4.0", display_width = 0L), A = c(13, 9, 14, 14, 13), B = c(12, 0, 9, 3, 10), C = c(13, 8, 14, 13, 11)), row.names = c(NA, -5L), class = c("tbl_df", "tbl", "data.frame"))

There are several things I am trying to do, but I get stuck at different junctures because of the way the data is formatted. First I need to sum up the values from columns A:D for each row into a variable called total. Next, I need to compute the probability by dividing each of columns A:D by total.

Here is where I face some issues. I wrote a function to perform the above:

functa <- function(x, id, vars) {
  
  x %>%
    mutate(total = rowSums(.[vars])) %>%
    mutate(prob = .[vars]/total)

}

When I call the function using the following line:

test <- functa(df_ED, "pid", c("A", "B", "C", "D"))

I get an object with 5 observations, but only 7 variables (instead of 10). When I inspect the object, I see 4 new variables (i.e., prob.A, prob.B, prob.C, prob. D) but they are read in as a single variable.

Any subsequent manipulations I would like to perform on this dataset cannot proceed as intended because of this. I have been working on this for the past two days but cannot find any information about this phenomenon and am guessing I am way in over my head.

My eventual goal with this function is to:

  1. compute a total variable (sum of A:D)
  2. compute a prob variable that should output 4 variables (i.e., A/total, B/total, etc.)
  3. recode prob variable such that all infinity values (i.e., "Inf") is recoded into 0
  4. sum all 4 prob variables into a single totalprob variable

Would appreciate any insights into this!

Upvotes: 1

Views: 107

Answers (2)

Owe Jessen
Owe Jessen

Reputation: 247

A different solution would be to change the layout of the table, in the first step by pivot_longer, where you calculate the probability, and in the next step by pivot_wider, where you get the desired final layout.

> df %>% 
+   pivot_longer(-id, names_to = "key", values_to = "value") %>%
+   group_by(id) %>%
+   mutate(prob = value / sum(value)) %>%
+   pivot_wider(names_from = key, values_from = c(value, prob))
# A tibble: 5 x 7
# Groups:   id [5]
     id value_A value_B value_C prob_A prob_B prob_C
  <dbl>   <dbl>   <dbl>   <dbl>  <dbl>  <dbl>  <dbl>
1     2      13      12      13  0.342  0.316  0.342
2     4       9       0       8  0.529  0      0.471
3     6      14       9      14  0.378  0.243  0.378
4     8      14       3      13  0.467  0.1    0.433
5    10      13      10      11  0.382  0.294  0.324

Upvotes: 1

Ronak Shah
Ronak Shah

Reputation: 388907

When you want to apply a function to multiple columns use across :

library(dplyr)

functa <- function(x, id, vars) {
  
  x %>%
           #sum all vars column
    mutate(total = rowSums(.[vars]),
           #Divide vars column with total and create new columns with prob
           across(all_of(vars), ~./total, .names = '{col}_prob'), 
           #Replace infinite value in prob column with 0
           across(ends_with('_prob'), ~replace(., is.infinite(.), 0))) %>%
           #Sum all prob columns. 
    mutate(totalprob = rowSums(select(., ends_with('prob'))))      
  
}

functa(df_ED, "pid", c("A", "B", "C"))

#     id     A     B     C total A_prob B_prob C_prob totalprob
#  <dbl> <dbl> <dbl> <dbl> <dbl>  <dbl>  <dbl>  <dbl>     <dbl>
#1     2    13    12    13    38  0.342  0.316  0.342         1
#2     4     9     0     8    17  0.529  0      0.471         1
#3     6    14     9    14    37  0.378  0.243  0.378         1
#4     8    14     3    13    30  0.467  0.1    0.433         1
#5    10    13    10    11    34  0.382  0.294  0.324         1

Upvotes: 2

Related Questions