Meriem
Meriem

Reputation: 33

pivot longer multiple column to 2 results columns

I have the following data frame:

Stat=c("Min","Q1","Med","Q3","Max","mean")

A=c(10.1,20.3,30.5,40.8,50.1,60.7)
Sample_1=rep(41, each=6)

B=c(15.8,25.7,35.2,45.1,55.5,65.7)
Sample_2=rep(49, each=6)

C=c(5.3,15.6,25.4,35.5,45.1,55.9)
Sample_3=rep(50, each=6)

D=c(25.1,35.3,45.2,55.4,65.9,75.6)
Sample_4=rep(59, each=6)

my_data=data.frame(Stat=Stat, A=A, Sample_1=Sample_1, B=B, Sample_2=Sample_2, C=C, Sample_3=Sample_3, D=D, Sample_4=Sample_4)

It gives a table that look like this:

Stat A Sample_1 B Sample_2 C Sample_3 D Sample_4
Min 10.1 41 15.8 49 5.3 50 25.1 59
Q1 20.3 41 25.7 49 15.6 50 35.3 59
Med 30.5 41 35.2 49 25.4 50 45.2 59
Q3 40.8 41 45.1 49 35.5 50 55.4 59
Max 50.1 41 55.5 49 45.1 50 65.9 59
mean 60.7 41 65.7 49 55.9 50 75.6 59

An I would like to transform it to get the following table:

Stat Tested_values Percentage Sample
Min A 10.1 41
Q1 A 20.3 41
Med A 30.5 41
Q3 A 40.8 41
Max A 50.1 41
Mean A 60.7 41
Min B 10.1 49
Q1 B 20.3 49
Med B 30.5 49
Q3 B 40.8 49
Max B 50.1 49
Mean B 60.7 49
. . . .
. . . .
Mean D 75.6 59

I tried with the function pivot_longer but was not able to get 2 separate columns

Upvotes: 2

Views: 2440

Answers (2)

aosmith
aosmith

Reputation: 36084

I'm not sure how to get where you want to using the pivot family of functions with your current dataset names.

However, if your column names followed a pattern, where we could tell which test group each column was related to, pivot_longer() could work for you.

Here's your same dataset with new column names to indicate which values are stored in the column (Obs vs Sample) and which test groups each column is related to (the group letters). The separator between the values name and group indicator is a "_".

my_data = data.frame(Stat = Stat, 
                   Obs_A = A, Sample_A = Sample_1, 
                   Obs_B = B, Sample_B = Sample_2, 
                   Obs_C = C, Sample_C = Sample_3, 
                   Obs_D = D, Sample_D = Sample_4)

my_data
#>   Stat Obs_A Sample_A Obs_B Sample_B Obs_C Sample_C Obs_D Sample_D
#> 1  Min  10.1       41  15.8       49   5.3       50  25.1       59
#> 2   Q1  20.3       41  25.7       49  15.6       50  35.3       59
#> 3  Med  30.5       41  35.2       49  25.4       50  45.2       59
#> 4   Q3  40.8       41  45.1       49  35.5       50  55.4       59
#> 5  Max  50.1       41  55.5       49  45.1       50  65.9       59
#> 6 mean  60.7       41  65.7       49  55.9       50  75.6       59

With that change we can pivot multiple columns with pivot_longer() to work with the sets of columns related to the same group.

In the code below you'll note the special .value sentinel in the names_to that, per the documentation, "indicates that component of the column name that defines the name of the column containing the cell values...". Then names_sep indicates how the column names are separated into value names and group indicator.

my_data %>% 
    pivot_longer(-Stat, names_to = c(".value", "Tested value"), names_sep = "_")
#> # A tibble: 24 x 4
#>    Stat  `Tested value`   Obs Sample
#>    <chr> <chr>          <dbl>  <dbl>
#>  1 Min   A               10.1     41
#>  2 Min   B               15.8     49
#>  3 Min   C                5.3     50
#>  4 Min   D               25.1     59
#>  5 Q1    A               20.3     41
#>  6 Q1    B               25.7     49
#>  7 Q1    C               15.6     50
#>  8 Q1    D               35.3     59
#>  9 Med   A               30.5     41
#> 10 Med   B               35.2     49
#> # ... with 14 more rows

Created on 2021-10-22 by the reprex package (v2.0.0)

See more examples at the bottom of the "Examples" section of the documentation of pivot_longer(). These and other examples online always seem to involve a pattern in the column names to allow us to keep related columns together.

Upvotes: 2

dario
dario

Reputation: 6483

You'd need two separate pivot_longer calls, one for each new column.

library(dplyr)
library(tidyr)

my_data %>% 
  pivot_longer(c(A, B, C, D), names_to = "Tested_Values", values_to = "Percentage") %>% 
  pivot_longer(starts_with("Sample"), values_to = "Sample") %>% 
  select(-name)

Returns:

   Stat Tested_Values Percentage Sample
1   Min             A       10.1     41
2   Min             A       10.1     49
3   Min             A       10.1     50
4   Min             A       10.1     59
5   Min             B       15.8     41
6   Min             B       15.8     49
7   Min             B       15.8     50
8   Min             B       15.8     59
9   Min             C        5.3     41
10  Min             C        5.3     49
11  Min             C        5.3     50
12  Min             C        5.3     59
13  Min             D       25.1     41
14  Min             D       25.1     49
15  Min             D       25.1     50
16  Min             D       25.1     59
17   Q1             A       20.3     41
18   Q1             A       20.3     49
19   Q1             A       20.3     50
20   Q1             A       20.3     59
21   Q1             B       25.7     41
22   Q1             B       25.7     49
23   Q1             B       25.7     50
24   Q1             B       25.7     59
25   Q1             C       15.6     41
26   Q1             C       15.6     49
27   Q1             C       15.6     50
28   Q1             C       15.6     59
29   Q1             D       35.3     41
30   Q1             D       35.3     49
31   Q1             D       35.3     50
32   Q1             D       35.3     59
33  Med             A       30.5     41
34  Med             A       30.5     49
35  Med             A       30.5     50
36  Med             A       30.5     59
37  Med             B       35.2     41
38  Med             B       35.2     49
39  Med             B       35.2     50
40  Med             B       35.2     59
41  Med             C       25.4     41
42  Med             C       25.4     49
43  Med             C       25.4     50
44  Med             C       25.4     59
45  Med             D       45.2     41
46  Med             D       45.2     49
47  Med             D       45.2     50
48  Med             D       45.2     59
49   Q3             A       40.8     41
50   Q3             A       40.8     49
51   Q3             A       40.8     50
52   Q3             A       40.8     59
53   Q3             B       45.1     41
54   Q3             B       45.1     49
55   Q3             B       45.1     50
56   Q3             B       45.1     59
57   Q3             C       35.5     41
58   Q3             C       35.5     49
59   Q3             C       35.5     50
60   Q3             C       35.5     59
61   Q3             D       55.4     41
62   Q3             D       55.4     49
63   Q3             D       55.4     50
64   Q3             D       55.4     59
65  Max             A       50.1     41
66  Max             A       50.1     49
67  Max             A       50.1     50
68  Max             A       50.1     59
69  Max             B       55.5     41
70  Max             B       55.5     49
71  Max             B       55.5     50
72  Max             B       55.5     59
73  Max             C       45.1     41
74  Max             C       45.1     49
75  Max             C       45.1     50
76  Max             C       45.1     59
77  Max             D       65.9     41
78  Max             D       65.9     49
79  Max             D       65.9     50
80  Max             D       65.9     59
81 mean             A       60.7     41
82 mean             A       60.7     49
83 mean             A       60.7     50
84 mean             A       60.7     59
85 mean             B       65.7     41
86 mean             B       65.7     49
87 mean             B       65.7     50
88 mean             B       65.7     59
89 mean             C       55.9     41
90 mean             C       55.9     49
91 mean             C       55.9     50
92 mean             C       55.9     59
93 mean             D       75.6     41
94 mean             D       75.6     49
95 mean             D       75.6     50
96 mean             D       75.6     59

Upvotes: 1

Related Questions