Ngram in R: calculating word frequency and sum of values

Question

I would like to perform the following calculations:

Input:

Column_A                    Column_B
Word_A                      10
Word_A Word_B               20
Word_B Word_A               30
Word_A Word_B Word_C        40

Output:

Column_A1                   Column_B1
Word_A                      100 = 10+20+30+40
Word_B                      90  = 20+30+40
Word_C                      40  = 40
Word_A Word_B               90  = 20+30+40
Word_A Word_C               40  = 40
Word_B Word_C               40  = 40
Word_A Word_B Word_C        40  = 40

The order of the words in the output does not matter, so Word_A Word_B = 90 = Word_B Word_A. Using RWeka and tm libraries I was able to extract unigrams (just one word), bit I will need to have n-gram where n=1,2,3 and calculate column_B1

Ngram in R: calculating word frequency and sum of values

Answers (1)

Related Questions