Reputation: 177
I have some data in multiple large data tables in R. I wish to merge and produce counts of various variables.
I can produce the counts within individual tables easily using the 'table' command, but I have not yet figured out the economical (preferably base R, one liner) command to then produce combined counts.
aaa<-table(MyData1$MyVar)
bbb<-table(MyData2$MyVar)
> aaa
Dogs 3
Cats 4
Horses 1
Sheep 2
Giraffes 3
> bbb
Dogs 27
Cats 1
Sheep 2
Ocelots 1
Desired Output:
Dogs 30
Cats 5
Horses 1
Sheep 4
Giraffes 3
Ocelots 1
I am sure there is a straightforward Base R way to do this I am just not seeing it.
Upvotes: 1
Views: 178
Reputation: 6921
First merge/concatenate your input, then apply table to it.
table(c(MyData1$MyVar, MyData2$MyVar))
You may run into issue if MyVar is a factor and its levels are different in MyData1 and MyData2. In this case, just lookup how to merge factor variables.
EDIT: if that doesn't suit your need, I suggest the following:
Merge the levels of all "MyVar" throughout all your "MyDatai" tables (from your example, I assume that it makes sense to do this).
total_levels <- unique(c(levels(MyData1$MyVar), levels(MyData2$MyVar)))
MyData1$MyVar <- factor(MyData1$MyVar, levels=total_levels)
MyData2$MyVar <- factor(MyData1$MyVar, levels=total_levels)
Obviously you will need wrap this into an apply-like function if you have around 100 data.frames.
Note that this is a one-time preprocessing operation, so I think it's ok if it is a bit costly. Ideally you can integrate it upstream when you generate/load the data.
At this point, all your "MyVar" have the same levels (but are still the same in terms of content of course). Now the good thing is, since table
works with the levels, all your tables will have the same entries:
aaa<-table(MyData1$MyVar)
bbb<-table(MyData2$MyVar)
> aaa
Dogs 3
Cats 4
Horses 1
Sheep 2
Giraffes 3
Ocelot 0
> bbb
Dogs 27
Cats 1
Horses 0
Sheep 2
Giraffes 0
Ocelots 1
And you can just sum them with aaa+bbb
or sum
if you have a lot. Addition of vectors is lightning fast :)
Upvotes: 2
Reputation: 13570
Base package:
aggregate(V2 ~ V1, data = rbind(df1, df2), FUN = sum)
dplyr:
library(dplyr)
rbind(df1, df2) %>% group_by(V1) %>% summarise(V2 = sum(V2))
Output:
V1 V2
1 Cats 5
2 Dogs 30
3 Giraffes 3
4 Horses 1
5 Sheep 4
6 Ocelots 1
Data:
df1 <- structure(list(V1 = structure(c(2L, 1L, 4L, 5L, 3L), .Label = c("Cats",
"Dogs", "Giraffes", "Horses", "Sheep"), class = "factor"), V2 = c(3L,
4L, 1L, 2L, 3L)), .Names = c("V1", "V2"), class = "data.frame", row.names = c(NA,
-5L))
df2 <- structure(list(V1 = structure(c(2L, 1L, 4L, 3L), .Label = c("Cats",
"Dogs", "Ocelots", "Sheep"), class = "factor"), V2 = c(27L, 1L,
2L, 1L)), .Names = c("V1", "V2"), class = "data.frame", row.names = c(NA,
-4L))
Upvotes: 2