Reputation: 335
I have a data frame in R, let's call it data
.
One of the columns, data$tags
contains strings. Each string is a comma separated list of tags (or categories that this entry relates to).
I'm trying to get a list of all available tags in the data frame.
I thought I could use one of the apply
functions to run the column over the strsplit
function and get one long concatenated vector with all string parts, then use unique
to get rid of the duplicates.
I tried:
func_split_tags <- function(e) {
return(unlist(strsplit(e," ")))
}
all_tags <- sapply(as.vector(data$tags), func_split_tags)
but that just gives me a list of the split-string vectors.
Does anyone have any idea how to make this work?
Thanks!
Upvotes: 0
Views: 799
Reputation: 887078
We could do this with str_extract
library(stringr)
unlist(str_extract_all(df$s, "\\w+"))
Upvotes: 0
Reputation: 10483
Something like this is what you are looking for?
df <- data.frame(x = seq(1:10), s = 'I am in the city', stringsAsFactors = FALSE)
as.character(unlist(sapply(df$s, function(x) strsplit(x, ' '))))
You could write that last line as if you don't want anything more than a simple strsplit
:
unlist(strsplit(df$s, ' '))
Upvotes: 2