Reputation: 28129
I have a string :
string1 <- "This is my string"
I would like to convert it to a vector that looks like this:
vector1
"This"
"is"
"my"
"string"
How do I do this? I know I could use the tm
package to convert to termDocumentMatrix
and then convert to a matrix but it would alphabetize the words and I need them to stay in the same order.
Upvotes: 27
Views: 56131
Reputation: 99321
If you're simply extracting words by splitting on the spaces, here are a couple of nice alternatives.
string1 <- "This is my string"
scan(text = string1, what = "")
# [1] "This" "is" "my" "string"
library(stringi)
stri_split_fixed(string1, " ")[[1]]
# [1] "This" "is" "my" "string"
stri_extract_all_words(string1, simplify = TRUE)
# [,1] [,2] [,3] [,4]
# [1,] "This" "is" "my" "string"
stri_split_boundaries(string1, simplify = TRUE)
# [,1] [,2] [,3] [,4]
# [1,] "This " "is " "my " "string"
Upvotes: 5
Reputation: 708
As a supplement, we can also use unlist()
to produce a vector from a given list structure:
string1 <- "This is my string" # get a list structure
unlist(strsplit(string1, "\\s+")) # unlist the list
#[1] "This" "is" "my" "string"
Upvotes: 5
Reputation: 4711
Try:
library(tm)
library("RWeka")
library(RWekajars)
NGramTokenizer(source1, Weka_control(min = 1, max = 1))
It is an over engineered solution for your problem. strsplit using Sacha's approach is generally just fine.
Upvotes: 1
Reputation: 47541
Slightly different from Dason, but this will split for any amount of white space including newlines:
string1 <- "This is my
string"
strsplit(string1, "\\s+")[[1]]
Upvotes: 15
Reputation: 61903
You can use strsplit to accomplish this task.
string1 <- "This is my string"
strsplit(string1, " ")[[1]]
#[1] "This" "is" "my" "string"
Upvotes: 45