Paul
Paul

Reputation: 223

How to split an list based on strings - Automatically

I want to split a list based on an automatically generated dictionary / index / glossary / notsurehowtocallit

I have a dataframe where the last column is a character list. Some of them contain 3 strings, some 20, others none. The data looks something like this

     name    age    category
1    John    34     c('sports', 'USA')
2    Mary    20     c('model', 'sports', 'Canada')
3    Sue     65     c('scholar', 'USA')
4    Carl    12     NA
n    ...     ..     ...

The data is very long and I do not know what to look for. That means, I don't have an expected list of strings. I want R to solve that problem for me and generate this list of strings for me.

For that I've already tried:

 > category.frq <- table(unlist(category))
 > cbind(names(category.frq),as.integer(category.frq))

Which gives me an convenient word count and index. But I am new to R so I am not sure how to proceed from there. Is there a package that can do that for me?

I would ideally have this result:

     name    age    category                        sports   USA   model  ...
1    John    34     c('sports', 'USA')              1        1     NA
2    Mary    20     c('model', 'sports', 'Canada')  1        NA    1
3    Sue     65     c('scholar', 'USA')             NA       1     NA
4    Carl    12     NA                              NA       NA    NA
n    ...     ..     ...                             ..       ..    ..

Upvotes: 0

Views: 49

Answers (1)

tofd
tofd

Reputation: 620

A slightly more in-depth exposition of @Akrun's comment...

df1 <- data.frame(category = I(list(c('a','b','c', 'a'), 
                                    c('b','d'), 
                                    c('b', 'e', 'f', 'd'), 
                                    c('g','h'),
                                    NA)))

l <- df1$category
names(l) <- seq_len(length(l))
df2 <- as.data.frame.matrix(t(table(stack(l))))
df2[df2 == 0] <- NA
df1 <- cbind(df1, df2)
df1

#    category  a  b  c  d  e  f  g  h
#1 a, b, c, a  2  1  1 NA NA NA NA NA
#2       b, d NA  1 NA  1 NA NA NA NA
#3 b, e, f, d NA  1 NA  1  1  1 NA NA
#4       g, h NA NA NA NA NA NA  1  1
#5         NA NA NA NA NA NA NA NA NA

Upvotes: 1

Related Questions