I am looking at converting the raw table below into something where the column with the subjects names become rows themselves in a new column called Subject. Raw data: ID Year Math_sub Eng_sub Chem_sub 1 2010 20 15 12 1 2015 17 20 14 1 2016 13 17 12 3 2015 14 13 NA 3 2016 12 12 NA 5 2010 20 15 13 5 2015 17 20 14 5 2016 NA 17 12 9 2015 14 13 NA 9 2016 15 15 16 with many more random IDs. Desired output: ID Year Math_sub Eng_sub Chem_sub Subject 1 2010 20 NA NA Maths 1 2010 NA 15 NA English 1 2010 NA NA 12 Chemistry 1 2015 17 NA NA Maths 1 2015 NA 20 NA English 1 2015 NA NA 14 Chemistry 1 2016 13 NA NA Maths 1 2016 NA 17 NA English 1 2016 NA NA 12 Chemistry 3 2015 14 NA NA Maths 3 2015 NA 13 NA English 3 2015 NA NA NA Chemistry 3 2016 12 NA NA Maths 3 2016 NA 12 NA English 3 2016 NA NA NA Chemistry I tried a lot of approaches using both the reshape and tidyr library but they are producing an output which is completely miles away from what I need. Any help will be very much appreciated. Thank you.

Reputation: 71

R convert column name to rows with some formatting

I am looking at converting the raw table below into something where the column with the subjects names become rows themselves in a new column called Subject.

Raw data:

ID	Year	Math_sub	Eng_sub	Chem_sub
1	2010	20	15	12
1	2015	17	20	14
1	2016	13	17	12
3	2015	14	13	NA
3	2016	12	12	NA
5	2010	20	15	13
5	2015	17	20	14
5	2016	NA	17	12
9	2015	14	13	NA
9	2016	15	15	16

with many more random IDs.

Desired output:

ID	Year	Math_sub	Eng_sub	Chem_sub	Subject
1	2010	20	NA	NA	Maths
1	2010	NA	15	NA	English
1	2010	NA	NA	12	Chemistry
1	2015	17	NA	NA	Maths
1	2015	NA	20	NA	English
1	2015	NA	NA	14	Chemistry
1	2016	13	NA	NA	Maths
1	2016	NA	17	NA	English
1	2016	NA	NA	12	Chemistry
3	2015	14	NA	NA	Maths
3	2015	NA	13	NA	English
3	2015	NA	NA	NA	Chemistry
3	2016	12	NA	NA	Maths
3	2016	NA	12	NA	English
3	2016	NA	NA	NA	Chemistry

I tried a lot of approaches using both the reshape and tidyr library but they are producing an output which is completely miles away from what I need.

Any help will be very much appreciated.

Thank you.

Upvotes: 1

Answers (2)

Jon Spring

Reputation: 66955

One approach could be to pivot longer, add a distinguishing feature to each row (eg row number), and then pivot wider again:

library(tidyverse)
dd |>
  pivot_longer(3:5) |>
  mutate(row = row_number(), subject=sub("_sub$", "", name)) |>
  pivot_wider(names_from = name, values_from = value)

(h/t @MrFlick for the reminder to add the subject column to match OP request.)

Result

      ID  Year   row subject Math_sub Eng_sub Chem_sub
   <int> <int> <int> <chr>      <int>   <int>    <int>
 1     1  2010     1 Math          20      NA       NA
 2     1  2010     2 Eng           NA      15       NA
 3     1  2010     3 Chem          NA      NA       12
 4     1  2015     4 Math          17      NA       NA
 5     1  2015     5 Eng           NA      20       NA
 6     1  2015     6 Chem          NA      NA       14
 7     1  2016     7 Math          13      NA       NA
 8     1  2016     8 Eng           NA      17       NA
 9     1  2016     9 Chem          NA      NA       12
10     3  2015    10 Math          14      NA       NA
# ℹ 20 more rows

Upvotes: 2

MrFlick

Reputation: 206586

This is an usual transformation. These steps can generate the same data if you aren't too picky about row and column order.

library(dplry)
library(purrr)

subjects <- select(dd, ends_with("_sub"))  %>% names()

map_df(subjects, 
  ~select(dd, ID, Year, all_of(.x)) |> mutate(Subject=sub("_sub$","", .x)))

We grab all the subject columns, loop over them extracting only the matching column and add a column with the name without "_sub". And then we let map_df put all the subsets back together in a single frame and add in the NA values for us.

Tested with

dd <- data.frame(ID = c(1L, 1L, 1L, 3L, 3L, 5L, 5L, 5L, 9L, 9L), 
    Year = c(2010L, 2015L, 2016L, 2015L, 2016L, 2010L, 2015L, 2016L, 2015L, 2016L), 
    Math_sub = c(20L, 17L, 13L, 14L, 12L, 20L, 17L, NA, 14L, 15L),
    Eng_sub = c(15L, 20L, 17L, 13L, 12L, 15L, 20L, 17L, 13L, 15L),
    Chem_sub = c(12L, 14L, 12L, NA, NA, 13L, 14L, 12L, NA, 16L)
)

Upvotes: 1

R convert column name to rows with some formatting

Answers (2)

Related Questions