Reputation: 71
I am looking at converting the raw table below into something where the column with the subjects names become rows themselves in a new column called Subject.
Raw data:
ID | Year | Math_sub | Eng_sub | Chem_sub |
---|---|---|---|---|
1 | 2010 | 20 | 15 | 12 |
1 | 2015 | 17 | 20 | 14 |
1 | 2016 | 13 | 17 | 12 |
3 | 2015 | 14 | 13 | NA |
3 | 2016 | 12 | 12 | NA |
5 | 2010 | 20 | 15 | 13 |
5 | 2015 | 17 | 20 | 14 |
5 | 2016 | NA | 17 | 12 |
9 | 2015 | 14 | 13 | NA |
9 | 2016 | 15 | 15 | 16 |
with many more random IDs.
Desired output:
ID | Year | Math_sub | Eng_sub | Chem_sub | Subject |
---|---|---|---|---|---|
1 | 2010 | 20 | NA | NA | Maths |
1 | 2010 | NA | 15 | NA | English |
1 | 2010 | NA | NA | 12 | Chemistry |
1 | 2015 | 17 | NA | NA | Maths |
1 | 2015 | NA | 20 | NA | English |
1 | 2015 | NA | NA | 14 | Chemistry |
1 | 2016 | 13 | NA | NA | Maths |
1 | 2016 | NA | 17 | NA | English |
1 | 2016 | NA | NA | 12 | Chemistry |
3 | 2015 | 14 | NA | NA | Maths |
3 | 2015 | NA | 13 | NA | English |
3 | 2015 | NA | NA | NA | Chemistry |
3 | 2016 | 12 | NA | NA | Maths |
3 | 2016 | NA | 12 | NA | English |
3 | 2016 | NA | NA | NA | Chemistry |
I tried a lot of approaches using both the reshape and tidyr library but they are producing an output which is completely miles away from what I need.
Any help will be very much appreciated.
Thank you.
Upvotes: 1
Views: 60
Reputation: 66955
One approach could be to pivot longer, add a distinguishing feature to each row (eg row number), and then pivot wider again:
library(tidyverse)
dd |>
pivot_longer(3:5) |>
mutate(row = row_number(), subject=sub("_sub$", "", name)) |>
pivot_wider(names_from = name, values_from = value)
(h/t @MrFlick for the reminder to add the subject column to match OP request.)
Result
ID Year row subject Math_sub Eng_sub Chem_sub
<int> <int> <int> <chr> <int> <int> <int>
1 1 2010 1 Math 20 NA NA
2 1 2010 2 Eng NA 15 NA
3 1 2010 3 Chem NA NA 12
4 1 2015 4 Math 17 NA NA
5 1 2015 5 Eng NA 20 NA
6 1 2015 6 Chem NA NA 14
7 1 2016 7 Math 13 NA NA
8 1 2016 8 Eng NA 17 NA
9 1 2016 9 Chem NA NA 12
10 3 2015 10 Math 14 NA NA
# ℹ 20 more rows
Upvotes: 2
Reputation: 206586
This is an usual transformation. These steps can generate the same data if you aren't too picky about row and column order.
library(dplry)
library(purrr)
subjects <- select(dd, ends_with("_sub")) %>% names()
map_df(subjects,
~select(dd, ID, Year, all_of(.x)) |> mutate(Subject=sub("_sub$","", .x)))
We grab all the subject columns, loop over them extracting only the matching column and add a column with the name without "_sub". And then we let map_df
put all the subsets back together in a single frame and add in the NA values for us.
Tested with
dd <- data.frame(ID = c(1L, 1L, 1L, 3L, 3L, 5L, 5L, 5L, 9L, 9L),
Year = c(2010L, 2015L, 2016L, 2015L, 2016L, 2010L, 2015L, 2016L, 2015L, 2016L),
Math_sub = c(20L, 17L, 13L, 14L, 12L, 20L, 17L, NA, 14L, 15L),
Eng_sub = c(15L, 20L, 17L, 13L, 12L, 15L, 20L, 17L, 13L, 15L),
Chem_sub = c(12L, 14L, 12L, NA, NA, 13L, 14L, 12L, NA, 16L)
)
Upvotes: 1