Reputation: 177

Create merged tables in R

I have some data in multiple large data tables in R. I wish to merge and produce counts of various variables.

I can produce the counts within individual tables easily using the 'table' command, but I have not yet figured out the economical (preferably base R, one liner) command to then produce combined counts.

aaa<-table(MyData1$MyVar)
bbb<-table(MyData2$MyVar)

> aaa
Dogs     3
Cats     4
Horses   1
Sheep    2
Giraffes 3

> bbb
Dogs     27
Cats      1
Sheep     2
Ocelots   1

Desired Output:

Dogs     30
Cats      5
Horses    1
Sheep     4
Giraffes  3
Ocelots   1

I am sure there is a straightforward Base R way to do this I am just not seeing it.

Upvotes: 1

Answers (2)

asachet

Reputation: 6921

First merge/concatenate your input, then apply table to it.

table(c(MyData1$MyVar, MyData2$MyVar))

You may run into issue if MyVar is a factor and its levels are different in MyData1 and MyData2. In this case, just lookup how to merge factor variables.

EDIT: if that doesn't suit your need, I suggest the following:

Merge the levels of all "MyVar" throughout all your "MyDatai" tables (from your example, I assume that it makes sense to do this).

total_levels <- unique(c(levels(MyData1$MyVar), levels(MyData2$MyVar)))

MyData1$MyVar <- factor(MyData1$MyVar, levels=total_levels)
MyData2$MyVar <- factor(MyData1$MyVar, levels=total_levels)

Obviously you will need wrap this into an apply-like function if you have around 100 data.frames.

Note that this is a one-time preprocessing operation, so I think it's ok if it is a bit costly. Ideally you can integrate it upstream when you generate/load the data.

At this point, all your "MyVar" have the same levels (but are still the same in terms of content of course). Now the good thing is, since table works with the levels, all your tables will have the same entries:

aaa<-table(MyData1$MyVar)
bbb<-table(MyData2$MyVar)

> aaa
Dogs     3
Cats     4
Horses   1
Sheep    2
Giraffes 3
Ocelot   0

> bbb
Dogs     27
Cats      1
Horses    0
Sheep     2
Giraffes  0
Ocelots   1

And you can just sum them with aaa+bbb or sum if you have a lot. Addition of vectors is lightning fast :)

Upvotes: 2

mpalanco

Reputation: 13580

Base package:

aggregate(V2 ~ V1, data = rbind(df1, df2), FUN = sum)

dplyr:

library(dplyr)
rbind(df1, df2) %>% group_by(V1) %>% summarise(V2 = sum(V2))

Output:

        V1 V2
1     Cats  5
2     Dogs 30
3 Giraffes  3
4   Horses  1
5    Sheep  4
6  Ocelots  1

Data:

df1 <- structure(list(V1 = structure(c(2L, 1L, 4L, 5L, 3L), .Label = c("Cats", 
"Dogs", "Giraffes", "Horses", "Sheep"), class = "factor"), V2 = c(3L, 
4L, 1L, 2L, 3L)), .Names = c("V1", "V2"), class = "data.frame", row.names = c(NA, 
-5L))

df2 <- structure(list(V1 = structure(c(2L, 1L, 4L, 3L), .Label = c("Cats", 
"Dogs", "Ocelots", "Sheep"), class = "factor"), V2 = c(27L, 1L, 
2L, 1L)), .Names = c("V1", "V2"), class = "data.frame", row.names = c(NA, 
-4L))

Upvotes: 2

Create merged tables in R

Answers (2)

Related Questions