Find combinations of words using R

Question

I'm editing some text and wondering whether I can programatically search for certain words.

These words: almost, nearly, quite, close to and very, do not work next to these words: certain, complete, dead, entire, essential and extinct.

Lets say I have this character vector:

text <- c("R is a very essential tool for data analysis. While it is regarded as domain specific, it is a very complete programming language. Almost certainly, many people who would benefit from using R, do not use it")

Can I get R to return a numeric vector, giving line numbers (or sentence numbers) where these words are placed next to each other?

Note that I've used "certainly", so ideally I would need R to search for words that contain "certain" or other words, as opposed to the whole word "certain" or other words.

Andrie · Accepted Answer

Use grep for this, after splitting your text at sentence boundaries using strsplit:

stext <- strsplit(text, split="\.")[[1]]
grep("certain", stext)
[1] 3

Find combinations of words using R

Answers (2)

Related Questions