Reputation: 41
I'm a newbie to R, but really like it and want to improve constantly. Now, after searching for a while, I need to ask you for help.
This is the given case:
1) I have sentences (sentence.1 and sentence.2 - all words are already lower-case) and create the sorted frequency lists of their words:
sentence.1 <- "bob buys this car, although his old car is still fine." # saves the sentence into sentence.1
sentence.2 <- "a car can cost you very much per month."
sentence.1.list <- strsplit(sentence.1, "\\W+", perl=T) #(I have these following commands thanks to Stefan Gries) we split the sentence at non-word characters
sentence.2.list <- strsplit(sentence.2, "\\W+", perl=T)
sentence.1.vector <- unlist(sentence.1.list) # then we create a vector of the list
sentence.2.vector <- unlist(sentence.2.list) # vectorizes the list
sentence.1.freq <- table(sentence.1.vector) # and finally create the frequency lists for
sentence.2.freq <- table(sentence.2.vector)
These are the results:
sentence.1.freq:
although bob buys car fine his is old still this
1 1 1 2 1 1 1 1 1 1
sentence.2.freq:
a can car cost month much per very you
1 1 1 1 1 1 1 1 1
Now, please, how could I combine these two frequency lists that I will have the following:
a although bob buys can car cost fine his is month much old per still this very you
NA 1 1 1 NA 2 NA 1 1 1 NA NA 1 NA 1 1 NA NA
1 NA NA NA 1 1 1 NA NA NA 1 1 NA 1 NA NA 1 1
Thus, this "table" should be "flexible" so that in case of entering a new sentence with the word, e.g. "and", the table would add the column with the label "and" between "a" and "although".
I thought of just adding new sentences into a new row and putting all not word that are not yet in the list column-wise (here, "and" would be to the right of "you") and sort the list again. However, I haven't managed this as already the sorting of the new sentence's words' frequencies according to the existing labels haven't been working (when there is e.g., "car" again, the new sentence's frequency of car should be written into the new sentence's row and the column of "car", but when there is e.g. "you" for the 1st time, its frequency should be written into the new sentence's row and a new column labeled "you").
Upvotes: 4
Views: 1846
Reputation: 173577
This isn't exactly what you describe, but what you're aiming for makes more sense to me organized by row, rather than by column (and R handles data organized this way a bit more easily anyway).
#Convert tables to data frames
a1 <- as.data.frame(sentence.1.freq)
a2 <- as.data.frame(sentence.2.freq)
#There are other options here, see note below
colnames(a1) <- colnames(a2) <- c('word','freq')
#Then merge
merge(a1,a2,by = "word",all = TRUE)
word freq.x freq.y
1 although 1 NA
2 bob 1 NA
3 buys 1 NA
4 car 2 1
5 fine 1 NA
6 his 1 NA
7 is 1 NA
8 old 1 NA
9 still 1 NA
10 this 1 NA
11 a NA 1
12 can NA 1
13 cost NA 1
14 month NA 1
15 much NA 1
16 per NA 1
17 very NA 1
18 you NA 1
You can then keep using merge
to add more sentences. I converted the column names for simplicity, but there are other options. Using the by.x
and by.y
arguments instead of just by
in merge
can indicate the specific columns merge on if the names aren't the same in each data frame. Also, the suffix
argument in merge
will control how the count columns are given unique names. The default is to append .x
and .y
but you can change that.
Upvotes: 3