Reputation: 1371
Given the following data in long format. Would like to do this for an arbitrary number of timepoints.
dat <- structure(list(srdr_id = c("172507", "172507", "172507", "172507",
"172619", "172619", "172619", "172619"), arm = c("CBT_Educ",
"CBT_MI", "CBT_Educ", "CBT_MI", "MI", "Educ", "MI", "Educ"),
timepoint = c(0, 0, 3, 3, 0, 0, 3, 3), n = c(102, 103, 100,
101, 58, 61, 45, 53), mean = c(37.69, 40.23, 34.53, 31.8,
4.6, 4.3, 4.4, 4.1), sd = c(16.06, 14.23, 19.78, 19.67, 2.2,
2.2, 2.3, 2.5)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA,
-8L))
Long dataset:
srdr_id arm timepoint n mean sd
<chr> <chr> <dbl> <dbl> <dbl> <dbl>
1 172507 CBT_Educ 0 102 37.7 16.1
2 172507 CBT_MI 0 103 40.2 14.2
3 172507 CBT_Educ 3 100 34.5 19.8
4 172507 CBT_MI 3 101 31.8 19.7
5 172619 MI 0 58 4.6 2.2
6 172619 Educ 0 61 4.3 2.2
7 172619 MI 3 45 4.4 2.3
8 172619 Educ 3 53 4.1 2.5
I would like to create a wide dataset, such that within each srdr_id and arm the three variables (n, mean and sd) appear in the same row.
srdr_id arm n.0 mean.0 sd.0 n.3 mean.3 sd.3
1 172507 CBT_Educ 102 37.7 16.1 100 34.5 19.8
2 172507 CBT_MI 103 40.2 14.2 101 31.8 19.7
5 172619 MI 58 4.6 2.2 45 4.4 2.3
6 172619 Educ 61 4.3 2.2 53 4.1 2.5
The following failed with:
Error in is.formula(formula) : object 'srdr_id' not found
data.table::dcast(data = dat, srdr_id + arm, value.var = c(n_analyzed, mean, sd))
Upvotes: 1
Views: 156
Reputation: 5405
A common workflow for this type of situation is gathering all the metrics, renaming them, and then spreading again. See below:
dat %>%
gather("measure", "val", n, mean, sd) %>%
mutate(measure = paste0(measure, ".", timepoint)) %>%
select(-timepoint) %>%
spread(measure, val)
# A tibble: 4 x 8
srdr_id arm mean.0 mean.3 n.0 n.3 sd.0 sd.3
<chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 172507 CBT_Educ 37.7 34.5 102 100 16.1 19.8
2 172507 CBT_MI 40.2 31.8 103 101 14.2 19.7
3 172619 Educ 4.3 4.1 61 53 2.2 2.5
4 172619 MI 4.6 4.4 58 45 2.2 2.3
library(data.table)
dt <- as.data.table(dat)
melt(dt, id.vars = c("srdr_id", "arm", "timepoint"))[
,`:=`(variable = paste0(variable, ".", timepoint), timepoint = NULL)
] %>%
dcast(srdr_id + arm ~ variable, value.var = "value")
srdr_id arm mean.0 mean.3 n.0 n.3 sd.0 sd.3
1: 172507 CBT_Educ 37.69 34.53 102 100 16.06 19.78
2: 172507 CBT_MI 40.23 31.80 103 101 14.23 19.67
3: 172619 Educ 4.30 4.10 61 53 2.20 2.50
4: 172619 MI 4.60 4.40 58 45 2.20 2.30
Upvotes: 3
Reputation: 2185
One alternative (probably not the most elegant), is to use group_by()
and summarise()
from the library dplyr.
Here, you don't have to make some calculations (all values are already in your inital dataset), so you can use functions like first()
and last()
to specify with values you want.
dat %>%
group_by(srdr_id, arm) %>%
summarise(
n0 = first(n), mean0 = first(mean), sd0 = first(sd),
n3 = last(n), mean3 = last(mean), sd3 = last(sd)
)
# srdr_id arm n0 mean0 sd0 n3 mean3 sd3
# <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 172507 CBT_Educ 102 37.7 16.1 100 34.5 19.8
# 2 172507 CBT_MI 103 40.2 14.2 101 31.8 19.7
# 3 172619 Educ 61 4.3 2.2 53 4.1 2.5
# 4 172619 MI 58 4.6 2.2 45 4.4 2.3
Upvotes: 1