Reputation: 13334
Say I have a string that reads:
"database service crashed due to monkeys in the circuit board and this is a serious problem."
How can I extract the, say, 5 words that follow the phrase 'due to'
So I would get this:
monkeys in the circuit board
Upvotes: 0
Views: 270
Reputation: 269471
Its not clear whether you want a single string as output or a string for each word but assuming you want a single string if x
is the input string then this sub
will do it:
s <- sub(".*due to ((\\w+ ){4}\\w+).*", "\\1", x)
giving:
> s
[1] "monkeys in the circuit board"
Here is a visualization of the regular expression:
.*due to ((\w+ ){4}\w+).*
If you want separate words then
strsplit(s, " ")[[1]]
giving:
[1] "monkeys" "in" "the" "circuit" "board"
Upvotes: 2
Reputation: 7654
Here is another approach. It has the advantage over RStudent's of extracting the five important words that follow "due to", but it creates an odd stemming result. I suspect that can be solved too. The two lines could be combined of course.
text <- "database service crashed due to monkeys in the circuit board and this is a serious problem."
text.short <- unlist(str_split(text, "due to"))
five <- str_extract_all(text.short[2], "(\\w){5}")
[1] "monke" "circu" "board" "serio" "probl"
Upvotes: 0
Reputation: 9618
What about this tinkered way?
v <- "database service crashed due to monkeys in the circuit board and this is a serious problem."
unlist(strsplit(unlist(strsplit(v, "due to"))[2], " "))[2:6]
[1] "monkeys" "in" "the" "circuit" "board"
Upvotes: 2