Javi_VM
Javi_VM

Reputation: 515

create list from characters in R tibble

I have a tibble with a character column. The character in each row is a set of words like this: "type:mytype,variable:myvariable,variable:myothervariable:asubvariableofthisothervariable". Things like that. I want to either convert this into columns in my tibble (a column "type", a column "variable", and so on; but then I don't really know what to do with my 3rd level words), or convert it to a column list x, so that x has a structure of sublists: x$type, x$variable, x$variable$myothervariable.

I'm not sure what is the best approach, but also, I don't know how to implement this two approaches that I suggest here. I have to say that I have maximum 3 levels, and more 1st level words than "type" and "variable".

Small Reproducible Example:

df <- tibble()
df$id<- 1:3
df$keywords <- c(
  "type:novel,genre:humor:black,year:2010"
  "type:dictionary,language:english,type:bilingual,otherlang:french"
  "type:essay,topic:philosophy:purposeoflife,year:2005"
)

# expected would be in idea 1: 
colnames(df)
# n, keywords, type, genre, year, 
# language, otherlang, topic

# on idea 2:
colnames(df)
# n, keywords, keywords.as.list

Upvotes: 1

Views: 457

Answers (1)

akrun
akrun

Reputation: 886948

We can use separate_rows from tidyr to split the 'keywords' column by ,, then with cSplit, split the column 'keywords' into multiple columns at :, reshape to 'long' format with pivot_longer and then reshape back to 'wide' with pivot_wider

library(dplyr)
library(tidyr)
library(data.table)
library(splitstackshape)
df %>% 
   separate_rows(keywords, sep=",") %>%
   cSplit("keywords", ":") %>% 
   pivot_longer(cols = keywords_2:keywords_3, values_drop_na = TRUE) %>% 
   select(-name) %>%
   mutate(rn = rowid(id, keywords_1)) %>%
   pivot_wider(names_from = keywords_1, values_from = value) %>%
   select(-rn) %>%
   type.convert(as.is = TRUE)

-output

# A tibble: 6 x 7
#     id type       genre  year language otherlang topic        
#  <int> <chr>      <chr> <int> <chr>    <chr>     <chr>        
#1     1 novel      humor  2010 <NA>     <NA>      <NA>         
#2     1 <NA>       black    NA <NA>     <NA>      <NA>         
#3     2 dictionary <NA>     NA english  french    <NA>         
#4     2 bilingual  <NA>     NA <NA>     <NA>      <NA>         
#5     3 essay      <NA>   2005 <NA>     <NA>      philosophy   
#6     3 <NA>       <NA>     NA <NA>     <NA>      purposeoflife

data

df <- structure(list(id = 1:3, keywords = c("type:novel,genre:humor:black,year:2010", 
"type:dictionary,language:english,type:bilingual,otherlang:french", 
"type:essay,topic:philosophy:purposeoflife,year:2005")), row.names = c(NA, 
-3L), class = c("tbl_df", "tbl", "data.frame"))

Upvotes: 1

Related Questions