theD
theD

Reputation: 71

R convert column name to rows with some formatting

I am looking at converting the raw table below into something where the column with the subjects names become rows themselves in a new column called Subject.

Raw data:

ID Year Math_sub Eng_sub Chem_sub
1 2010 20 15 12
1 2015 17 20 14
1 2016 13 17 12
3 2015 14 13 NA
3 2016 12 12 NA
5 2010 20 15 13
5 2015 17 20 14
5 2016 NA 17 12
9 2015 14 13 NA
9 2016 15 15 16

with many more random IDs.

Desired output:

ID Year Math_sub Eng_sub Chem_sub Subject
1 2010 20 NA NA Maths
1 2010 NA 15 NA English
1 2010 NA NA 12 Chemistry
1 2015 17 NA NA Maths
1 2015 NA 20 NA English
1 2015 NA NA 14 Chemistry
1 2016 13 NA NA Maths
1 2016 NA 17 NA English
1 2016 NA NA 12 Chemistry
3 2015 14 NA NA Maths
3 2015 NA 13 NA English
3 2015 NA NA NA Chemistry
3 2016 12 NA NA Maths
3 2016 NA 12 NA English
3 2016 NA NA NA Chemistry

I tried a lot of approaches using both the reshape and tidyr library but they are producing an output which is completely miles away from what I need.

Any help will be very much appreciated.

Thank you.

Upvotes: 1

Views: 60

Answers (2)

Jon Spring
Jon Spring

Reputation: 66955

One approach could be to pivot longer, add a distinguishing feature to each row (eg row number), and then pivot wider again:

library(tidyverse)
dd |>
  pivot_longer(3:5) |>
  mutate(row = row_number(), subject=sub("_sub$", "", name)) |>
  pivot_wider(names_from = name, values_from = value)

(h/t @MrFlick for the reminder to add the subject column to match OP request.)

Result

      ID  Year   row subject Math_sub Eng_sub Chem_sub
   <int> <int> <int> <chr>      <int>   <int>    <int>
 1     1  2010     1 Math          20      NA       NA
 2     1  2010     2 Eng           NA      15       NA
 3     1  2010     3 Chem          NA      NA       12
 4     1  2015     4 Math          17      NA       NA
 5     1  2015     5 Eng           NA      20       NA
 6     1  2015     6 Chem          NA      NA       14
 7     1  2016     7 Math          13      NA       NA
 8     1  2016     8 Eng           NA      17       NA
 9     1  2016     9 Chem          NA      NA       12
10     3  2015    10 Math          14      NA       NA
# ℹ 20 more rows

Upvotes: 2

MrFlick
MrFlick

Reputation: 206586

This is an usual transformation. These steps can generate the same data if you aren't too picky about row and column order.

library(dplry)
library(purrr)

subjects <- select(dd, ends_with("_sub"))  %>% names()

map_df(subjects, 
  ~select(dd, ID, Year, all_of(.x)) |> mutate(Subject=sub("_sub$","", .x)))

We grab all the subject columns, loop over them extracting only the matching column and add a column with the name without "_sub". And then we let map_df put all the subsets back together in a single frame and add in the NA values for us.

Tested with

dd <- data.frame(ID = c(1L, 1L, 1L, 3L, 3L, 5L, 5L, 5L, 9L, 9L), 
    Year = c(2010L, 2015L, 2016L, 2015L, 2016L, 2010L, 2015L, 2016L, 2015L, 2016L), 
    Math_sub = c(20L, 17L, 13L, 14L, 12L, 20L, 17L, NA, 14L, 15L),
    Eng_sub = c(15L, 20L, 17L, 13L, 12L, 15L, 20L, 17L, 13L, 15L),
    Chem_sub = c(12L, 14L, 12L, NA, NA, 13L, 14L, 12L, NA, 16L)
)

Upvotes: 1

Related Questions