Reputation: 21
How to find index or position of a word in a given string, below code says the starting position of word and length. After finding the position of the word, I want to extract preceding and succeeding words in my project.
library(stringr)
Output_text <- c("applicable to any future potential contract termination disputes as the tepco dispute was somewhat unique")
word_pos <- regexpr('termination', Output_text)
Output:
[1] 45
attr(,"match.length")
[1] 11
attr(,"index.type")
[1] "chars"
attr(,"useBytes")
[1] TRUE
45 - It is counting each and every character and displaying starting position of "termination"
11- is length
Here, "termination", is at 7th position, how to find it using r programming
Appreciate your help.
Upvotes: 1
Views: 5604
Reputation: 1136
Here it is:
library(stringr)
Output_text <- c("applicable to any future potential contract termination disputes as the tepco dispute was somewhat unique")
words <- unlist(str_split(Output_text, " "))
which(words == "termination")
[1] 7
Edit:
For multiple occurrences of the word in text and generating next and previous keywords:
# Adding a few random "termination" words to the string:
Output_text <- c("applicable to any future potential contract termination disputes as the tepco dispute was termination somewhat unique termination")
words <- unlist(str_split(Output_text, " "))
t1 <- which(words == "termination")
next_keyword <- words[t1+1]
previous_keywords <- words[t1-1]
> next_keyword
[1] "disputes" "somewhat" NA
> previous_keywords
[1] "contract" "was" "unique"
Upvotes: 2
Reputation: 66874
The easiest way is just the match termination
and the surrounding words in str_extract
and then str_remove
termination
.
str_remove(str_extract(Output_text,"\\w+ termination \\w+"),"termination ")
[1] "contract disputes"
Upvotes: 0
Reputation: 1664
You can do this without worrying about character indices using regular expressions without any external package.
# replace whole string by the words preceding and following 'termination'
(words <- sub("[\\S\\s]+ (\\S+) termination (\\S+) [\\S\\s]+", "\\1 \\2", Output_text, perl = T))
# [1] "contract disputes"
# Split the resulting string into two individual strings
(words <- unlist(strsplit(words, " ")))
# [1] "contract" "disputes"
Upvotes: 0