Marco De Virgilis
Marco De Virgilis

Reputation: 1087

group and split columns dplyr

I have a dataframe that look like this one:

structure(list(group = c("A", "B", "C"), mean_meth1 = c(10, 20, 
30), mean_meth2 = c(15, 25, 35), lower_meth1 = c(7, 17, 27), 
lower_meth2 = c(12, 22, 32), upper_meth1 = c(13, 23, 33), 
upper_meth2 = c(18, 28, 38)), class = "data.frame", row.names = c(NA, 
-3L))

    group mean_meth1 mean_meth2 lower_meth1 lower_meth2 upper_meth1 upper_meth2
1     A         10         15           7          12          13          18
2     B         20         25          17          22          23          28
3     C         30         35          27          32          33          38

3 rows for 3 different groups, A,B,C with two experiments carried out on each group. Each methodology has a mean value, an upper limit and a lower limit. I need to plot this structure and it will be easier if I can get the data.frame in the following shape:

  group method mean lower upper
1     A  meth1   10     7    13
2     A  meth2   15    12    18
3     B  meth1   20    17    23
4     B  meth2   25    22    28
5     C  meth1   30    27    33
6     C  meth2   35    32    38

I tried with pivot_longer and pivot_wider from dplyr but I haven't got much. Could you guys help me? Thank you

Upvotes: 0

Views: 254

Answers (1)

Ian Campbell
Ian Campbell

Reputation: 24770

One thing you might have been trying to get to was this:

library(dplyr)
library(tidyr)
df %>% 
  pivot_longer(-group, names_to = "method") %>%
  separate(method, into = c("variable","method"), sep = "_") %>%
  pivot_wider(id_cols = c("group","method"), names_from = "variable")
# A tibble: 6 x 5
  group method  mean lower upper
  <chr> <chr>  <dbl> <dbl> <dbl>
1 A     meth1     10     7    13
2 A     meth2     15    12    18
3 B     meth1     20    17    23
4 B     meth2     25    22    28
5 C     meth1     30    27    33
6 C     meth2     35    32    38

However, there is a built in special "sentinel" in pivot_longer, .value. From help(pivot_longer):

values_to: A string specifying the name of the column to create from the data stored in cell values. If names_to is a character containing the special .value sentinel, this value will be ignored, and the name of the value column will be derived from part of the existing column names.

df %>% 
  pivot_longer(-group, names_to = c(".value", "method"), names_sep = "_")
# A tibble: 6 x 5
  group method  mean lower upper
  <chr> <chr>  <dbl> <dbl> <dbl>
1 A     meth1     10     7    13
2 A     meth2     15    12    18
3 B     meth1     20    17    23
4 B     meth2     25    22    28
5 C     meth1     30    27    33
6 C     meth2     35    32    38

Thus, the first half of the column name is ignored and the name fo the value column is derived from that part of the column name, as described in help.

Upvotes: 4

Related Questions