Reputation: 223
I want to split a list based on an automatically generated dictionary / index / glossary / notsurehowtocallit
I have a dataframe where the last column is a character list. Some of them contain 3 strings, some 20, others none. The data looks something like this
name age category
1 John 34 c('sports', 'USA')
2 Mary 20 c('model', 'sports', 'Canada')
3 Sue 65 c('scholar', 'USA')
4 Carl 12 NA
n ... .. ...
The data is very long and I do not know what to look for. That means, I don't have an expected list of strings. I want R to solve that problem for me and generate this list of strings for me.
For that I've already tried:
> category.frq <- table(unlist(category))
> cbind(names(category.frq),as.integer(category.frq))
Which gives me an convenient word count and index. But I am new to R so I am not sure how to proceed from there. Is there a package that can do that for me?
I would ideally have this result:
name age category sports USA model ...
1 John 34 c('sports', 'USA') 1 1 NA
2 Mary 20 c('model', 'sports', 'Canada') 1 NA 1
3 Sue 65 c('scholar', 'USA') NA 1 NA
4 Carl 12 NA NA NA NA
n ... .. ... .. .. ..
Upvotes: 0
Views: 49
Reputation: 620
A slightly more in-depth exposition of @Akrun's comment...
df1 <- data.frame(category = I(list(c('a','b','c', 'a'),
c('b','d'),
c('b', 'e', 'f', 'd'),
c('g','h'),
NA)))
l <- df1$category
names(l) <- seq_len(length(l))
df2 <- as.data.frame.matrix(t(table(stack(l))))
df2[df2 == 0] <- NA
df1 <- cbind(df1, df2)
df1
# category a b c d e f g h
#1 a, b, c, a 2 1 1 NA NA NA NA NA
#2 b, d NA 1 NA 1 NA NA NA NA
#3 b, e, f, d NA 1 NA 1 1 1 NA NA
#4 g, h NA NA NA NA NA NA 1 1
#5 NA NA NA NA NA NA NA NA NA
Upvotes: 1