MysteryGuy
MysteryGuy

Reputation: 1151

Is it possible to use `kwic` function to find words near to each other?

I found this reference : https://www.safaribooksonline.com/library/view/regular-expressions-cookbook/9781449327453/ch05s07.html Is it possible to use it with kwicfunction in the quanteda package to be able to find documents in a corpus containing words that are not "stuck" but close to each other, with maybe a few other words between ?

for example, if I give two words in the function, I would like to find the documents in a corpus where these two words occur but maybe with some words between. For example, you tell me "engine" and "electrical", I will also get the reports where "electrical synchronous engine" appears but not the ones in which "engine" and "electrical" appear in completely different contexts.

Upvotes: 1

Views: 599

Answers (1)

Kohei Watanabe
Kohei Watanabe

Reputation: 890

quanteda does not have a NEAR operator, but you can do the same thing using window argument of tokens_select(). In this example, I am searching words five words from "america*" uisng kwic():

require(quanteda)
toks <- tokens(data_corpus_inaugural)
toks_america <- tokens_select(toks, "america*", window = 5)

kwic(toks_america, "econom*")
# [2013-Obama, 45] has been tested by crises | economic | recovery has begun. America's

kwic(toks_america, "power")
# [1997-Clinton, 85] it can give Americans the | power | to make a government is

Upvotes: 2

Related Questions