schefflaa
schefflaa

Reputation: 43

How to 'ungroup' a pre-grouped data frame

I am using the 'esoph' data in R. It is a data frame with records for 88 age/alcohol/tobacco combinations. Here is an excerpt of its data:

Here is an excerpt of its data.

What I am trying to achieve is to pracitcally remove the alcgp column but keep its data for the ncases and ncontrols columns. So for every entry which has the same age group (agegp) and tobacco group (tobgp) but different alcohol group (alcgp) to add up the values for cases and controls and store them in a single row.

For example row 1, 5, 9, 12 would be merged.

Upvotes: 0

Views: 149

Answers (2)

asifzuba
asifzuba

Reputation: 470

This is a one-liner in dplyr. Would highly recommend you take a look at it.

library(dplyr)
esoph %>% 
    group_by(agegp, tobgp) %>% 
    summarize(total_cases=sum(ncases), 
              total_controls = sum(controls))
...
...
   agegp tobgp    total_cases total_controls
 1 25-34 0-9g/day           0             70
 2 25-34 10-19              1             19
 3 25-34 20-29              0             11
 4 25-34 30+                0             16
 5 35-44 0-9g/day           2            109
 6 35-44 10-19              4             46
 7 35-44 20-29              3             27
 8 35-44 30+                0             17
 9 45-54 0-9g/day          14            104
10 45-54 10-19             13             57

Upvotes: 1

Azat
Azat

Reputation: 75

It can be efficiently done in data.table

esoph[,alcgp := NULL] # remove alcgp column

Merge (sum) ncases and ncontrols by agegp and topgp. If you

esoph[, .(sum(ncases),sum(ncontrols)), .(agegp,topgp)]

Clarification:

you should convert your dataset to data.table first

library(data.table)
setDT(esoph)

Would be nice if you provide reproducible example, its hard to reproduce from screenshot.

Upvotes: 0

Related Questions