R: combining frequency lists with different lengths by labels?

Question

I'm a newbie to R, but really like it and want to improve constantly. Now, after searching for a while, I need to ask you for help.

This is the given case:

1) I have sentences (sentence.1 and sentence.2 - all words are already lower-case) and create the sorted frequency lists of their words:

sentence.1 <- "bob buys this car, although his old car is still fine." # saves the sentence into sentence.1
sentence.2 <- "a car can cost you very much per month."

sentence.1.list <- strsplit(sentence.1, "\W+", perl=T) #(I have these following commands thanks to Stefan Gries) we split the sentence at non-word characters
sentence.2.list <- strsplit(sentence.2, "\W+", perl=T)

sentence.1.vector <- unlist(sentence.1.list) # then we create a vector of the list
sentence.2.vector <- unlist(sentence.2.list) # vectorizes the list

sentence.1.freq <- table(sentence.1.vector) # and finally create the frequency lists for 
sentence.2.freq <- table(sentence.2.vector)

These are the results:

sentence.1.freq:
although      bob     buys      car     fine      his       is      old    still     this 
       1        1        1        2        1        1        1        1        1        1

sentence.2.freq:
a   can   car  cost month  much   per  very   you 
1     1     1     1     1     1     1     1     1

Now, please, how could I combine these two frequency lists that I will have the following:

 a  although  bob  buys  can  car  cost fine his  is  month much old per still this very you
NA         1    1     1   NA    2    NA    1   1   1     NA   NA   1  NA     1    1   NA  NA
 1        NA   NA    NA    1    1     1   NA  NA  NA      1    1  NA   1    NA   NA    1   1

Thus, this "table" should be "flexible" so that in case of entering a new sentence with the word, e.g. "and", the table would add the column with the label "and" between "a" and "although".

I thought of just adding new sentences into a new row and putting all not word that are not yet in the list column-wise (here, "and" would be to the right of "you") and sort the list again. However, I haven't managed this as already the sorting of the new sentence's words' frequencies according to the existing labels haven't been working (when there is e.g., "car" again, the new sentence's frequency of car should be written into the new sentence's row and the column of "car", but when there is e.g. "you" for the 1st time, its frequency should be written into the new sentence's row and a new column labeled "you").

R: combining frequency lists with different lengths by labels?

Answers (1)

Related Questions