Reputation: 21
I have seen some possible discussion of my problem elsewhere but it either wasn't resolved or I could not fully understand if the answer applied, so I'm creating a new question.
The following question in particular touches on this subject but is not resolved. Gathering wide columns into multiple long columns using pivot_longer
Take the following sample data. As you can see there is a unique identifier variable, and then 8 other variables. Of the other 8, you can group them into two sets, gpa and percent_a. For each set there is a class, group, course, and dept value.
In my actual data I have about 20 different sets, all with the same structure, the same four descriptors in each set.
What I would like to do is perform a function similar to pivot_longer. Except instead of combining multiple columns into a set of key and value columns, each unique set in my data (class, group, course, dept) would be grouped into there own key/value columns.
set.seed(101)
df <- data.frame(
id = 1:10,
class_gpa = rnorm(10, 0, 1),
course_gpa = rnorm(10, 0, 1),
group_gpa = rnorm(10, 0, 1),
dept_gpa = rnorm(10, 0, 1),
class_percent_a = rnorm(10, 0, 1),
course_percent_a = rnorm(10, 0, 1),
group_percent_a = rnorm(10, 0, 1),
dept_percent_a = rnorm(10, 0, 1)
)
So in this example, lets say I group all of the gpa values into two columns (gpa_type, and gpa_value) and the percent_a values into two columns (percent_a_type, percent_a_value), then I would end up at the end with only 5 columns:
id, gpa_type, gpa_value, percent_a_type, percent_a_value
Is there a way to do this? Either with pivot_longer or another method. Thanks.
Upvotes: 2
Views: 2628
Reputation: 21641
Honestly, I would rather simply do:
df %>% pivot_longer(-id, names_to = c("type", ".value"), names_pattern = "([^_]+)_(.*)")
And keep the data into a more practical format:
# A tibble: 40 x 4
id type gpa percent_a
<int> <chr> <dbl> <dbl>
1 1 class -0.326 0.482
2 1 course 0.526 -1.15
3 1 group -0.164 -0.260
4 1 dept 0.895 1.51
5 2 class 0.552 0.758
6 2 course -0.795 -0.274
7 2 group 0.709 -1.41
8 2 dept 0.279 1.62
9 3 class -0.675 -2.32
10 3 course 1.43 0.578
# … with 30 more rows
Why duplicate the "type" attribute for each "set"?
For your desired output:
# A tibble: 40 x 5
id gpa_type gpa_value percent_a_type percent_a_value
<int> <chr> <dbl> <chr> <dbl>
1 1 class -0.326 class 0.482
2 1 course 0.526 course -1.15
3 1 group -0.164 group -0.260
4 1 dept 0.895 dept 1.51
5 2 class 0.552 class 0.758
6 2 course -0.795 course -0.274
7 2 group 0.709 group -1.41
8 2 dept 0.279 dept 1.62
9 3 class -0.675 class -2.32
10 3 course 1.43 course 0.578
# … with 30 more rows
You could try:
lst_df <- df %>%
gather(key, value, -id) %>%
extract(key, into = c("var", "type"), "([^_]+)_(.*)") %>%
split(.$type)
names(lst_df) %>%
map_dfc(~ setNames(
lst_df[[.x]] %>%
select(-type),
c("id", paste0(.x, c("_type", "_value"))))) %>%
select(-matches("id\\d+"))
Upvotes: 3