Reputation: 33
I have the following data frame:
Stat=c("Min","Q1","Med","Q3","Max","mean")
A=c(10.1,20.3,30.5,40.8,50.1,60.7)
Sample_1=rep(41, each=6)
B=c(15.8,25.7,35.2,45.1,55.5,65.7)
Sample_2=rep(49, each=6)
C=c(5.3,15.6,25.4,35.5,45.1,55.9)
Sample_3=rep(50, each=6)
D=c(25.1,35.3,45.2,55.4,65.9,75.6)
Sample_4=rep(59, each=6)
my_data=data.frame(Stat=Stat, A=A, Sample_1=Sample_1, B=B, Sample_2=Sample_2, C=C, Sample_3=Sample_3, D=D, Sample_4=Sample_4)
It gives a table that look like this:
Stat | A | Sample_1 | B | Sample_2 | C | Sample_3 | D | Sample_4 |
---|---|---|---|---|---|---|---|---|
Min | 10.1 | 41 | 15.8 | 49 | 5.3 | 50 | 25.1 | 59 |
Q1 | 20.3 | 41 | 25.7 | 49 | 15.6 | 50 | 35.3 | 59 |
Med | 30.5 | 41 | 35.2 | 49 | 25.4 | 50 | 45.2 | 59 |
Q3 | 40.8 | 41 | 45.1 | 49 | 35.5 | 50 | 55.4 | 59 |
Max | 50.1 | 41 | 55.5 | 49 | 45.1 | 50 | 65.9 | 59 |
mean | 60.7 | 41 | 65.7 | 49 | 55.9 | 50 | 75.6 | 59 |
An I would like to transform it to get the following table:
Stat | Tested_values | Percentage | Sample |
---|---|---|---|
Min | A | 10.1 | 41 |
Q1 | A | 20.3 | 41 |
Med | A | 30.5 | 41 |
Q3 | A | 40.8 | 41 |
Max | A | 50.1 | 41 |
Mean | A | 60.7 | 41 |
Min | B | 10.1 | 49 |
Q1 | B | 20.3 | 49 |
Med | B | 30.5 | 49 |
Q3 | B | 40.8 | 49 |
Max | B | 50.1 | 49 |
Mean | B | 60.7 | 49 |
. | . | . | . |
. | . | . | . |
Mean | D | 75.6 | 59 |
I tried with the function pivot_longer but was not able to get 2 separate columns
Upvotes: 2
Views: 2440
Reputation: 36084
I'm not sure how to get where you want to using the pivot family of functions with your current dataset names.
However, if your column names followed a pattern, where we could tell which test group each column was related to, pivot_longer()
could work for you.
Here's your same dataset with new column names to indicate which values are stored in the column (Obs vs Sample) and which test groups each column is related to (the group letters). The separator between the values name and group indicator is a "_".
my_data = data.frame(Stat = Stat,
Obs_A = A, Sample_A = Sample_1,
Obs_B = B, Sample_B = Sample_2,
Obs_C = C, Sample_C = Sample_3,
Obs_D = D, Sample_D = Sample_4)
my_data
#> Stat Obs_A Sample_A Obs_B Sample_B Obs_C Sample_C Obs_D Sample_D
#> 1 Min 10.1 41 15.8 49 5.3 50 25.1 59
#> 2 Q1 20.3 41 25.7 49 15.6 50 35.3 59
#> 3 Med 30.5 41 35.2 49 25.4 50 45.2 59
#> 4 Q3 40.8 41 45.1 49 35.5 50 55.4 59
#> 5 Max 50.1 41 55.5 49 45.1 50 65.9 59
#> 6 mean 60.7 41 65.7 49 55.9 50 75.6 59
With that change we can pivot multiple columns with pivot_longer()
to work with the sets of columns related to the same group.
In the code below you'll note the special .value
sentinel in the names_to
that, per the documentation, "indicates that component of the column name that defines the name of the column containing the cell values...". Then names_sep
indicates how the column names are separated into value names and group indicator.
my_data %>%
pivot_longer(-Stat, names_to = c(".value", "Tested value"), names_sep = "_")
#> # A tibble: 24 x 4
#> Stat `Tested value` Obs Sample
#> <chr> <chr> <dbl> <dbl>
#> 1 Min A 10.1 41
#> 2 Min B 15.8 49
#> 3 Min C 5.3 50
#> 4 Min D 25.1 59
#> 5 Q1 A 20.3 41
#> 6 Q1 B 25.7 49
#> 7 Q1 C 15.6 50
#> 8 Q1 D 35.3 59
#> 9 Med A 30.5 41
#> 10 Med B 35.2 49
#> # ... with 14 more rows
Created on 2021-10-22 by the reprex package (v2.0.0)
See more examples at the bottom of the "Examples" section of the documentation of pivot_longer()
. These and other examples online always seem to involve a pattern in the column names to allow us to keep related columns together.
Upvotes: 2
Reputation: 6483
You'd need two separate pivot_longer
calls, one for each new column.
library(dplyr)
library(tidyr)
my_data %>%
pivot_longer(c(A, B, C, D), names_to = "Tested_Values", values_to = "Percentage") %>%
pivot_longer(starts_with("Sample"), values_to = "Sample") %>%
select(-name)
Returns:
Stat Tested_Values Percentage Sample 1 Min A 10.1 41 2 Min A 10.1 49 3 Min A 10.1 50 4 Min A 10.1 59 5 Min B 15.8 41 6 Min B 15.8 49 7 Min B 15.8 50 8 Min B 15.8 59 9 Min C 5.3 41 10 Min C 5.3 49 11 Min C 5.3 50 12 Min C 5.3 59 13 Min D 25.1 41 14 Min D 25.1 49 15 Min D 25.1 50 16 Min D 25.1 59 17 Q1 A 20.3 41 18 Q1 A 20.3 49 19 Q1 A 20.3 50 20 Q1 A 20.3 59 21 Q1 B 25.7 41 22 Q1 B 25.7 49 23 Q1 B 25.7 50 24 Q1 B 25.7 59 25 Q1 C 15.6 41 26 Q1 C 15.6 49 27 Q1 C 15.6 50 28 Q1 C 15.6 59 29 Q1 D 35.3 41 30 Q1 D 35.3 49 31 Q1 D 35.3 50 32 Q1 D 35.3 59 33 Med A 30.5 41 34 Med A 30.5 49 35 Med A 30.5 50 36 Med A 30.5 59 37 Med B 35.2 41 38 Med B 35.2 49 39 Med B 35.2 50 40 Med B 35.2 59 41 Med C 25.4 41 42 Med C 25.4 49 43 Med C 25.4 50 44 Med C 25.4 59 45 Med D 45.2 41 46 Med D 45.2 49 47 Med D 45.2 50 48 Med D 45.2 59 49 Q3 A 40.8 41 50 Q3 A 40.8 49 51 Q3 A 40.8 50 52 Q3 A 40.8 59 53 Q3 B 45.1 41 54 Q3 B 45.1 49 55 Q3 B 45.1 50 56 Q3 B 45.1 59 57 Q3 C 35.5 41 58 Q3 C 35.5 49 59 Q3 C 35.5 50 60 Q3 C 35.5 59 61 Q3 D 55.4 41 62 Q3 D 55.4 49 63 Q3 D 55.4 50 64 Q3 D 55.4 59 65 Max A 50.1 41 66 Max A 50.1 49 67 Max A 50.1 50 68 Max A 50.1 59 69 Max B 55.5 41 70 Max B 55.5 49 71 Max B 55.5 50 72 Max B 55.5 59 73 Max C 45.1 41 74 Max C 45.1 49 75 Max C 45.1 50 76 Max C 45.1 59 77 Max D 65.9 41 78 Max D 65.9 49 79 Max D 65.9 50 80 Max D 65.9 59 81 mean A 60.7 41 82 mean A 60.7 49 83 mean A 60.7 50 84 mean A 60.7 59 85 mean B 65.7 41 86 mean B 65.7 49 87 mean B 65.7 50 88 mean B 65.7 59 89 mean C 55.9 41 90 mean C 55.9 49 91 mean C 55.9 50 92 mean C 55.9 59 93 mean D 75.6 41 94 mean D 75.6 49 95 mean D 75.6 50 96 mean D 75.6 59
Upvotes: 1