Reputation:
I have a list of words in a file. For example they are NUT, CHANNEL, DIA, CARBON, STEEL , integrated, packaging, solutions
Now I have a sentence that says NUTCHANNELDIA 16U NCCARBONSTEEL. Now I need to split this output like below
a= NUTCHANNELDIA 16U NCCARBONSTEEL, integratedpackagingsolutions
a= split words(NUTCHANNELDIA 16U NCCARBONSTEEL,
integratedpackagingsolutions)
a= NUT CHANNEL DIA 16U NC CARBON STEEL
Is there any method for that
Upvotes: 1
Views: 94
Reputation: 10263
This is a very simple approach which might work for you:
word.list <- c("NUT", "CHANNEL", "DIA", "CARBON", "STEEL")
a <- "NUTCHANNELDIA 16U NCCARBONSTEEL"
for (word in word.list) {
a <- gsub(word, paste0(word, " "), a)
}
print(a)
[1] "NUT CHANNEL DIA 16U NCCARBON STEEL "
It is unclear to me, if you just want the string to be more readable, or to have it actually split up into a vector. In any case, the above should be fairly simple to modify.
Upvotes: 0
Reputation: 522741
Here is a base R option using strsplit
. We can try splitting on the following pattern:
(?<=NUT|CHANNEL|DIA|CARBON|STEEL)|(?<=.)(?=NUT|CHANNEL|DIA|CARBON|STEEL)
This will split if, at any point in the string, what either precedes or follows is one of your keywords. Note that the (?<=.)
term is necessary due to the way positive lookaheads in strsplit
behave.
terms <- c("NUT", "CHANNEL", "DIA", "CARBON", "STEEL")
regex <- paste(terms, collapse="|")
a <- "NUTCHANNELDIA 16U NCCARBONSTEEL"
strsplit(a, paste0("(?<=", regex, ")|(?<=.)(?=", regex, ")"), perl=TRUE)
[[1]]
[1] "NUT" "CHANNEL" "DIA" " 16U NC" "CARBON" "STEEL"
The 16U NC
term has some leading whitespace which I didn't attempt to remove. If this be a concern of yours, you could either trim whitespace on each term as you consume it, or we could try to modify the pattern to do that.
Upvotes: 3