How to check if a paragraph is part of a text in R

Question

I have one paragrah of text (a vector of words) and I would like to see if it is "part" of a long text (a vector of words). However, I am know that this paragraph does not appear in the text in its exact form, but with slight changes: a few words could miss, the order could be slightly different, some words could be inserted as parenthetical elements etc.

I am currently implementing solutions "by hand", such as looking if most of the words of the paragraph are in the text, looking the distance between these words, their order, etc... I was however wondering if there is no built-in method to do that?

I already checked the tm package, but it does not seem to do that...

Any idea?

PinkFluffyUnicorn · Accepted Answer

I fear that you are stuck with hand-writing an approach, e.g. grep-ing some word groups and having some kind of matching threshold.

How to check if a paragraph is part of a text in R

Answers (1)

Related Questions