Reputation: 1334
I would like to use dplyr::mutate
to add two named columns to a dataframe simulataneously and with a single function call. Consider the following example
library(dplyr)
n <- 1e2; M <- 1e3
variance <- 1
x <- rnorm(n*M, 0, variance)
s <- rep(1:M, each = n)
dat <- data.frame(s = s, x = x)
ci_studclt <- function(x, alpha = 0.05) {
n <- length(x)
S_n <- var(x)
mean(x) + qt(c(alpha/2, 1 - alpha/2), df = n-1)*sqrt(S_n / n)
}
ci_studclt(x)
Trying something like the below returns an error, since obviously two values are produced and cannot be inserted into a single atomic-type column.
dat %>%
group_by(s) %>%
mutate(ci = ci_studclt(x, variance))
It seems one option is to insert a list column then unnest_wider
and that this is easier with data.table or the specific case of splitting a string column into two new columns.
In my example, a confidence interval (lower and upper bound) come out of a function and I would like to directly add both as new columns to dat
e.g. calling the columns ci_lower
and ci_upper
.
Is there a straightforward way of doing this with dplyr
or do I need to insert the elements as a list column then unnest?
NB Keep in mind that the confidence interval values are a function of a group of simulated values x
, grouped by s
; the CI values should be constant within a group.
Upvotes: 3
Views: 880
Reputation: 41
Other possible variation (if you don't want to change your ci_studclt
function) how it can be done:
dat %>%
group_by(s) %>%
mutate(
across(x,
.fns = list(
lower = ~ci_studclt(.)[1],
upper = ~ci_studclt(.)[2]
)
)
)
In this case output will also contain new x_lower
and x_upper
columns. This variant is also somewhat scalable, so if you want to calculate your function over other column y
as well, you can just replace x
with c(x,y)
and have also y_lower
and y_upper
columns in dat
as well.
Actually, all the stuff that Allan did in his answer could be done inside mutate call and without any modification of initial function:
dat %>%
group_by(s) %>%
mutate(
t(ci_studclt(x)) %>%
as.data.frame() %>%
set_names(c('ci_lower','ci_upper'))
)
We just transpose an output from ci_studclt(x)
for treating it as row by data.frame
function and give this 1-row dataframe correct names.
Upvotes: 1
Reputation: 17069
You can do this by having your function (or a wrapper function) return a data.frame
. When you call it in mutate
, don’t specify a column name (or else you’ll end up with a nested data.frame
column). If you want to specify names for the new columns, you can include them as function arguments as in the below.
library(dplyr)
n <- 1e2; M <- 1e3
variance <- 1
x <- rnorm(n*M, 0, variance)
s <- rep(1:M, each = n)
dat <- data.frame(s = s, x = x)
ci_studclt <- function(x, alpha = 0.05) {
n <- length(x)
S_n <- var(x)
mean(x) + qt(c(alpha/2, 1 - alpha/2), df = n-1)*sqrt(S_n / n)
}
ci_wrapper <- function(x, alpha = 0.05, names_out = c("ci_lower", "ci_upper")) {
ci <- ci_studclt(x, alpha = alpha)
out <- data.frame(ci[[1]], ci[[2]])
names(out) <- names_out
out
}
# original code was ci_studclt(x, variance)
# but ci_studclt() doesn't take a variance argument, so I omitted
dat %>%
group_by(s) %>%
mutate(ci_wrapper(x))
output:
# A tibble: 100,000 x 4
# Groups: s [1,000]
s x ci_lower ci_upper
<int> <dbl> <dbl> <dbl>
1 1 0.233 -0.223 0.139
2 1 1.03 -0.223 0.139
3 1 1.53 -0.223 0.139
4 1 0.0150 -0.223 0.139
5 1 -0.211 -0.223 0.139
6 1 -1.13 -0.223 0.139
7 1 -1.51 -0.223 0.139
8 1 0.371 -0.223 0.139
9 1 1.80 -0.223 0.139
10 1 -0.137 -0.223 0.139
# ... with 99,990 more rows
With specified column names:
dat %>%
group_by(s) %>%
mutate(ci_wrapper(x, names_out = c("ci.lo", "ci.hi")))
output:
# A tibble: 100,000 x 4
# Groups: s [1,000]
s x ci.lo ci.hi
<int> <dbl> <dbl> <dbl>
1 1 0.233 -0.223 0.139
2 1 1.03 -0.223 0.139
3 1 1.53 -0.223 0.139
4 1 0.0150 -0.223 0.139
5 1 -0.211 -0.223 0.139
6 1 -1.13 -0.223 0.139
7 1 -1.51 -0.223 0.139
8 1 0.371 -0.223 0.139
9 1 1.80 -0.223 0.139
10 1 -0.137 -0.223 0.139
# ... with 99,990 more rows
Upvotes: 2
Reputation: 173793
If you get your function to return a two-column data frame with repeated values of the same length as the input, then this becomes very easy:
ci_studclt <- function(x, alpha = 0.05) {
n <- length(x)
S_n <- var(x)
res <- mean(x) + qt(c(alpha/2, 1 - alpha/2), df = n-1)*sqrt(S_n / n)
data.frame(lower = rep(res[1], length(x)), upper = res[2])
}
dat %>%
group_by(s) %>%
mutate(ci_studclt(x))
#> # A tibble: 100,000 x 4
#> # Groups: s [1,000]
#> s x lower upper
#> <int> <dbl> <dbl> <dbl>
#> 1 1 -0.767 -0.147 0.293
#> 2 1 -0.480 -0.147 0.293
#> 3 1 -1.31 -0.147 0.293
#> 4 1 0.219 -0.147 0.293
#> 5 1 0.650 -0.147 0.293
#> 6 1 0.542 -0.147 0.293
#> 7 1 -0.249 -0.147 0.293
#> 8 1 2.22 -0.147 0.293
#> 9 1 -0.239 -0.147 0.293
#> 10 1 0.176 -0.147 0.293
#> # ... with 99,990 more rows
Upvotes: 1