dan
dan

Reputation: 6314

Using tidyverse to reshape a data.frame and its column names

I have a data.frame of some experiment with several factors and measured values for each sample. For example:

factors <- c("age","sex")

The data.frame looks like this:

library(dplyr)
set.seed(1)
df <- do.call(rbind,lapply(1:10,function(i) expand.grid(age=c("Y","O"),sex=c("F","M")) %>% dplyr::mutate(val=rnorm(4))))
grouped.mean.val.df <- df %>% dplyr::group_by_(.dots=factors) %>% dplyr::summarise(mean.val=mean(val))

I want to create a data.frame which has a single row and the number of columns is the number of factor combinations (i.e. nrow(expand.grid(age=c("Y","O"),sex=c("F","M")) in this example), where the value is the mean df$val for the corresponding combination of factors.

To get the mean df$val for each combination of factors I do:

grouped.mean.val.df <- df %>% dplyr::group_by_(.dots=factors) %>% dplyr::summarise(mean.val=mean(val))

And the resulting data.frame I'd like to obtain is:

res.df <- data.frame(Y.F=grouped.mean.val.df$mean.val[1],
                     Y.M=grouped.mean.val.df$mean.val[2],
                     O.F=grouped.mean.val.df$mean.val[3],
                     O.M=grouped.mean.val.df$mean.val[4])

Is there a tidyverse way to get that?

Upvotes: 2

Views: 168

Answers (1)

akrun
akrun

Reputation: 887851

We can do unite and then a spread. unite the 'age' and 'sex' to create a single column, mutate the values to factor (to make the order as the same as in the expected) and do a spread to 'wide' format

library(tidyverse)
grouped.mean.val.df %>%
   unite(agesex, age, sex, sep=".") %>% 
   mutate(agesex = factor(agesex, levels = unique(agesex))) %>%
   spread(agesex, mean.val)
# A tibble: 1 x 4
#     Y.F   Y.M    O.F     O.M
#   <dbl> <dbl>  <dbl>   <dbl>
#1 0.0695 0.411 -0.118 0.00577

Also, instead of group_by_, we can use group_by_atwhich takes strings as variables

df %>%
     group_by_at(factors) %>%
     summarise(mean.val = mean(val)) %>%
     unite(agesex, age, sex, sep=".") %>% 
     mutate(agesex = factor(agesex, levels = unique(agesex))) %>%
     spread(agesex, mean.val)

Upvotes: 3

Related Questions