Reputation: 43
I am using the 'esoph' data in R. It is a data frame with records for 88 age/alcohol/tobacco combinations. Here is an excerpt of its data:
What I am trying to achieve is to pracitcally remove the alcgp column but keep its data for the ncases and ncontrols columns. So for every entry which has the same age group (agegp) and tobacco group (tobgp) but different alcohol group (alcgp) to add up the values for cases and controls and store them in a single row.
For example row 1, 5, 9, 12 would be merged.
Upvotes: 0
Views: 149
Reputation: 470
This is a one-liner in dplyr
. Would highly recommend you take a look at it.
library(dplyr)
esoph %>%
group_by(agegp, tobgp) %>%
summarize(total_cases=sum(ncases),
total_controls = sum(controls))
...
...
agegp tobgp total_cases total_controls
1 25-34 0-9g/day 0 70
2 25-34 10-19 1 19
3 25-34 20-29 0 11
4 25-34 30+ 0 16
5 35-44 0-9g/day 2 109
6 35-44 10-19 4 46
7 35-44 20-29 3 27
8 35-44 30+ 0 17
9 45-54 0-9g/day 14 104
10 45-54 10-19 13 57
Upvotes: 1
Reputation: 75
It can be efficiently done in data.table
esoph[,alcgp := NULL] # remove alcgp column
Merge (sum) ncases and ncontrols by agegp and topgp. If you
esoph[, .(sum(ncases),sum(ncontrols)), .(agegp,topgp)]
you should convert your dataset to data.table first
library(data.table)
setDT(esoph)
Would be nice if you provide reproducible example, its hard to reproduce from screenshot.
Upvotes: 0