user10367073
user10367073

Reputation:

how to split the words in R

I have a list of words in a file. For example they are NUT, CHANNEL, DIA, CARBON, STEEL , integrated, packaging, solutions

Now I have a sentence that says NUTCHANNELDIA 16U NCCARBONSTEEL. Now I need to split this output like below

a= NUTCHANNELDIA 16U NCCARBONSTEEL, integratedpackagingsolutions
a= split words(NUTCHANNELDIA 16U NCCARBONSTEEL, 
   integratedpackagingsolutions)
a= NUT CHANNEL DIA 16U NC CARBON STEEL

Is there any method for that

Upvotes: 1

Views: 94

Answers (2)

Anders Ellern Bilgrau
Anders Ellern Bilgrau

Reputation: 10263

This is a very simple approach which might work for you:

word.list <- c("NUT", "CHANNEL", "DIA", "CARBON", "STEEL")

a <- "NUTCHANNELDIA 16U NCCARBONSTEEL"

for (word in word.list) {
  a <- gsub(word, paste0(word, " "), a)  
}

print(a)
[1] "NUT CHANNEL DIA  16U NCCARBON STEEL "

It is unclear to me, if you just want the string to be more readable, or to have it actually split up into a vector. In any case, the above should be fairly simple to modify.

Upvotes: 0

Tim Biegeleisen
Tim Biegeleisen

Reputation: 522741

Here is a base R option using strsplit. We can try splitting on the following pattern:

(?<=NUT|CHANNEL|DIA|CARBON|STEEL)|(?<=.)(?=NUT|CHANNEL|DIA|CARBON|STEEL)

This will split if, at any point in the string, what either precedes or follows is one of your keywords. Note that the (?<=.) term is necessary due to the way positive lookaheads in strsplit behave.

terms <- c("NUT", "CHANNEL", "DIA", "CARBON", "STEEL")
regex <- paste(terms, collapse="|")
a <- "NUTCHANNELDIA 16U NCCARBONSTEEL"
strsplit(a, paste0("(?<=", regex, ")|(?<=.)(?=", regex, ")"), perl=TRUE)
[[1]]
[1] "NUT"     "CHANNEL" "DIA"     " 16U NC" "CARBON"  "STEEL"

Demo

The 16U NC term has some leading whitespace which I didn't attempt to remove. If this be a concern of yours, you could either trim whitespace on each term as you consume it, or we could try to modify the pattern to do that.

Upvotes: 3

Related Questions