cms72
cms72

Reputation: 177

How do I reshape a dataframe so that the items in each column become a column header?

I hope you can help - I've tried to search for questions like this but I might be searching for the wrong terms because I can't find anything.

Basically, I have a dataframe with column headers Boat, Car and Train..

ID <- c(1,2,3,4,5,6)
Boat  <- c("Green, Blue", "Green", "Blue","Pink, Blue, Green","Blue","Blue, Green")
Car <- c("White","White","","","White","")
Train <-c("Grey","","Grey","Grey","","Grey")
df <-data.frame(ID,Boat,Car,Train)

Which gives the below table

ID      Boat              Car            Train
1   Green, Blue          White          Grey
2      Green             White   
3      Blue                             Grey
4   Pink, Blue, Green                   Grey
5   Blue                 White
6   Blue, Green                         Grey

And I would like to convert each unique value (regardless of order, i.e Blue, Green or Green, Blue) within each column into a column header and flip the original column headers to become the values of each corresponding column so that it looks like the table below

ID    Green    Blue      Pink     White     Grey
1     Boat     Boat               Car       Train
2     Boat                        Car
3              Boat                         Train
4     Boat     Boat      Boat               Train
5              Boat               Car
6     Boat     Boat                         Train

Upvotes: 0

Views: 221

Answers (2)

A5C1D2H2I1M1N2O1R2T1
A5C1D2H2I1M1N2O1R2T1

Reputation: 193517

Here's an approach using cSplit from "splitstackshape" and melt + dcast from "data.table":

library(data.table)
library(splitstackshape)

cSplit(melt(as.data.table(df), 1), "value", ",", "long")[, dcast(.SD, ID ~ value, value.var = "variable")]
#    ID Blue Green  Grey Pink White
# 1:  1 Boat  Boat Train <NA>   Car
# 2:  2 <NA>  Boat  <NA> <NA>   Car
# 3:  3 Boat  <NA> Train <NA>  <NA>
# 4:  4 Boat  Boat Train Boat  <NA>
# 5:  5 Boat  <NA>  <NA> <NA>   Car
# 6:  6 Boat  Boat Train <NA>  <NA>

Upvotes: 2

Martin Gal
Martin Gal

Reputation: 16978

You could use tidyr, dplyr and stringr:

df %>%
  pivot_longer(cols=c("Boat", "Car", "Train"), names_to="type", values_to="color") %>%
  mutate(color=str_split(color, ", ")) %>%
  unnest(color) %>%
  filter(color!="") %>%
  pivot_wider(names_from=color, values_from=type) %>%
  select(ID, Green, Blue, Pink, White, Grey)

returns

# A tibble: 6 x 6
     ID Green Blue  Pink  White Grey 
  <dbl> <chr> <chr> <chr> <chr> <chr>
1     1 Boat  Boat  NA    Car   Train
2     2 Boat  NA    NA    Car   NA   
3     3 NA    Boat  NA    NA    Train
4     4 Boat  Boat  Boat  NA    Train
5     5 NA    Boat  NA    Car   NA   
6     6 Boat  Boat  NA    NA    Train

Upvotes: 3

Related Questions