Prabhakar Reddy
Prabhakar Reddy

Reputation: 21

How to find word index or position in a given string using r programming

How to find index or position of a word in a given string, below code says the starting position of word and length. After finding the position of the word, I want to extract preceding and succeeding words in my project.

library(stringr)
Output_text <- c("applicable to any future potential contract termination disputes as the tepco dispute was somewhat unique")

word_pos <- regexpr('termination', Output_text)


Output:

[1] 45
attr(,"match.length")
[1] 11
attr(,"index.type")
[1] "chars"
attr(,"useBytes")
[1] TRUE

45 - It is counting each and every character and displaying starting position of "termination"

11- is length

Here, "termination", is at 7th position, how to find it using r programming

Appreciate your help.

Upvotes: 1

Views: 5604

Answers (3)

denisafonin
denisafonin

Reputation: 1136

Here it is:

library(stringr)

Output_text <- c("applicable to any future potential contract termination disputes as the tepco dispute was somewhat unique")

words <- unlist(str_split(Output_text, " "))

which(words == "termination")
[1] 7

Edit:

For multiple occurrences of the word in text and generating next and previous keywords:

# Adding a few random "termination" words to the string:

Output_text <- c("applicable to any future potential contract termination disputes as the tepco dispute was termination somewhat unique termination")

words <- unlist(str_split(Output_text, " "))

t1 <- which(words == "termination")
next_keyword <- words[t1+1]
previous_keywords <- words[t1-1]

> next_keyword
[1] "disputes" "somewhat" NA        
> previous_keywords
[1] "contract" "was"      "unique" 

Upvotes: 2

James
James

Reputation: 66874

The easiest way is just the match termination and the surrounding words in str_extract and then str_remove termination.

str_remove(str_extract(Output_text,"\\w+ termination \\w+"),"termination ")
[1] "contract disputes"

Upvotes: 0

jkd
jkd

Reputation: 1664

You can do this without worrying about character indices using regular expressions without any external package.

# replace whole string by the words preceding and following 'termination'
(words <- sub("[\\S\\s]+ (\\S+) termination (\\S+) [\\S\\s]+", "\\1 \\2", Output_text, perl = T))
# [1] "contract disputes"

# Split the resulting string into two individual strings
(words <- unlist(strsplit(words, " ")))
# [1] "contract" "disputes"

Upvotes: 0

Related Questions