Reputation: 33
I am trying to create a function to manipulate different datasets for but am facing several issues with this task. I am providing a simplified version of the data I am trying to manipulate in the dput() output below:
structure(list(id = structure(c(2, 4, 6, 8, 10), label = "iid", format.spss = "F4.0", display_width = 0L), A = c(13, 9, 14, 14, 13), B = c(12, 0, 9, 3, 10), C = c(13, 8, 14, 13, 11)), row.names = c(NA, -5L), class = c("tbl_df", "tbl", "data.frame"))
There are several things I am trying to do, but I get stuck at different junctures because of the way the data is formatted. First I need to sum up the values from columns A:D for each row into a variable called total
. Next, I need to compute the probability by dividing each of columns A:D by total
.
Here is where I face some issues. I wrote a function to perform the above:
functa <- function(x, id, vars) {
x %>%
mutate(total = rowSums(.[vars])) %>%
mutate(prob = .[vars]/total)
}
When I call the function using the following line:
test <- functa(df_ED, "pid", c("A", "B", "C", "D"))
I get an object with 5 observations, but only 7 variables (instead of 10). When I inspect the object, I see 4 new variables (i.e., prob.A, prob.B, prob.C, prob. D) but they are read in as a single variable.
Any subsequent manipulations I would like to perform on this dataset cannot proceed as intended because of this. I have been working on this for the past two days but cannot find any information about this phenomenon and am guessing I am way in over my head.
My eventual goal with this function is to:
total
variable (sum of A:D)prob
variable that should output 4 variables (i.e., A/total, B/total, etc.)prob
variable such that all infinity values (i.e., "Inf") is recoded into 0prob
variables into a single totalprob
variableWould appreciate any insights into this!
Upvotes: 1
Views: 107
Reputation: 247
A different solution would be to change the layout of the table, in the first step by pivot_longer
, where you calculate the probability, and in the next step by pivot_wider
, where you get the desired final layout.
> df %>%
+ pivot_longer(-id, names_to = "key", values_to = "value") %>%
+ group_by(id) %>%
+ mutate(prob = value / sum(value)) %>%
+ pivot_wider(names_from = key, values_from = c(value, prob))
# A tibble: 5 x 7
# Groups: id [5]
id value_A value_B value_C prob_A prob_B prob_C
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 2 13 12 13 0.342 0.316 0.342
2 4 9 0 8 0.529 0 0.471
3 6 14 9 14 0.378 0.243 0.378
4 8 14 3 13 0.467 0.1 0.433
5 10 13 10 11 0.382 0.294 0.324
Upvotes: 1
Reputation: 388907
When you want to apply a function to multiple columns use across
:
library(dplyr)
functa <- function(x, id, vars) {
x %>%
#sum all vars column
mutate(total = rowSums(.[vars]),
#Divide vars column with total and create new columns with prob
across(all_of(vars), ~./total, .names = '{col}_prob'),
#Replace infinite value in prob column with 0
across(ends_with('_prob'), ~replace(., is.infinite(.), 0))) %>%
#Sum all prob columns.
mutate(totalprob = rowSums(select(., ends_with('prob'))))
}
functa(df_ED, "pid", c("A", "B", "C"))
# id A B C total A_prob B_prob C_prob totalprob
# <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#1 2 13 12 13 38 0.342 0.316 0.342 1
#2 4 9 0 8 17 0.529 0 0.471 1
#3 6 14 9 14 37 0.378 0.243 0.378 1
#4 8 14 3 13 30 0.467 0.1 0.433 1
#5 10 13 10 11 34 0.382 0.294 0.324 1
Upvotes: 2