Reputation: 13334
I have a character vector (vec) like this:
[1] "super good dental associates" "cheap dentist in bel air md"
"dentures " "dentures "
"in office teeth whitening" "in office teeth whitening"
"dental gum surgery bel air, md"
[8] "dental implants" "dental implants"
"veneer teeth pictures"
I need to break this apart into individuals words. I tried this:
singleWords <- strsplit(vec, ' ')[[1]]
but, I only get the split on the first element of that vector:
[1] "super" "good" "dental" "associates"
How can I get a single vector of ALL the words as individual elements?
Upvotes: 3
Views: 1895
Reputation: 99331
Just to confirm my comment, and since you mentioned it wasn't working, take a look. Since a couple of the elements have extra spaces, I would recommend using \\s+
as the regex to split on instead of the single-space from my comment. Cheers.
> ( newVec <- unlist(sapply(vec, strsplit, "\\s+", USE.NAMES = FALSE)) )
# [1] "super" "good" "dental" "associates" "cheap" "dentist"
# [7] "in" "bel" "air" "md" "dentures" "dentures"
#[13] "in" "office" "teeth" "whitening" "in" "office"
#[19] "teeth" "whitening" "dental" "gum" "surgery" "bel"
#[25] "air," "md" "dental" "implants" "dental" "implants"
#[31] "veneer" "teeth" "pictures"
And since I see a stray comma in there, it might be a good idea to clean all the punctuation (if any remains) with a call to gsub
> gsub("[[:punct:]]", "", newVec)
Upvotes: 2
Reputation: 9344
You could try:
strsplit(paste(vec, collapse = " "), ' ')[[1]]
Upvotes: 2