Joseph Harvard
Joseph Harvard

Reputation: 37

To pivot a multiple column data frame in R, into one column, and splitting the original column name

My data is a more complex but simplified it basically looks like this. Each row has only one Yes in one of the multiple Color:* columns. The "Color" before the colon needs to be the new column name, with the named color "Blue", "Red", or "Green" after the colon pivoted under "Color" to the correct "IDENTITY" I forgot to mention there are variable columns before and after these multiple columns I want to pivot into one column. the columns I want to pivot all have the same name before the colon.

IDENTITY Color:Blue Color:Red Color:Green
1 Yes
2 Yes
3 Yes

and what I would like is to pivot to this

IDENTITY Color
1 Green
2 Blue
3 Red

I am not sure if this is a pivot problem. I have read through the tidyr pivot documentation at

https://tidyr.tidyverse.org/articles/pivot.html

I do not see a similar example or able to identify one of the solutions that might work with my supplied data.

Can anyone help me with a code chunk I can follow to solve what seems on the face of it a simple problem, but eludes my limited proficiency with R. Thank you and Seasons Blessings

Upvotes: 0

Views: 1267

Answers (2)

akrun
akrun

Reputation: 886938

We may convert the blanks ("") to NA, in character columns by looping across the character columns, then reshape to 'long' by selecting the cols that have column names starts_with "Color:"

library(tidyr)
library(dplyr)
df1 %>%
   mutate(across(where(is.character), ~ na_if(.x, ""))) %>% 
   pivot_longer(cols = starts_with("Color:"), names_to = c(".value", "Color2"), 
     names_sep = ":", values_drop=TRUE) %>% 
  select(IDENTITY, Color = Color2)

-output

# A tibble: 3 × 2
  IDENTITY Color
     <int> <chr>
1        1 Green
2        2 Blue 
3        3 Red  

Upvotes: 2

Adrian Fletcher
Adrian Fletcher

Reputation: 170

You would need to use the pivot_longer function. This will gather columns and turn them into rows.

Note: I had to put `` in front of the column names since : is one of those pesky reserved characters in R. You want to avoid having that as a column separator ( _ could be a useful substitution). You can't always control this, but it's just something to look out for.

library(tidyverse)

x <- tribble( ~IDENTITY, ~`Color:Blue`, ~`Color:Red`, ~`Color:Green`,
              1L, "", "", "Yes",
              2L, "Yes", "", "",
              3L, "" , "Yes", "")

x %>%
  rename_with(stringr::str_replace,
              pattern = "Color:",
              replacement = "") %>%
  pivot_longer(!IDENTITY,
               names_to = "Color") %>%
  filter(value != "") %>%
  select(!value)

Upvotes: 1

Related Questions