Reputation: 8474
I want to summarise the percentage of people that have been treated BY region.
I have created a dummy dataset for this purpose:
id <- seq(1:1000)
region <- rep(c("A","B","C","D","E"),c(200,200,200,200,200))
treatment <- rep(seq(1:2), each=4)
d <- data.frame(id,region,treatment)
How would I find out (a) the total number of people in each region (I presume I would use length for this purpose) and (b) the percentage of people who had treatment 1 (as oppose to 2) BY region?
I will have NAs for some of the IDs, so if this could be incorporated in the code from the outset, that would be appreciated.
I have used ddply in the past to summarise a continuous variable (i.e. the mean) but am struggling when using a factor variable.
Any help would be gratefully appreciated.
Upvotes: 2
Views: 7716
Reputation: 5239
For completeness, here's how you can do it using ddply()
from plyr
:
library(plyr)
ddply(d[!is.na(d$id),],.(region),summarize,
N = length(region),
prop=mean(treatment==1))
# region N prop
# 1 A 200 0.5
# 2 B 200 0.5
# 3 C 200 0.5
# 4 D 200 0.5
# 5 E 200 0.5
This assumes that you want to deal with the NA
values in id
by removing the observation.
Upvotes: 0
Reputation: 6496
A dplyr
solution:
library(dplyr)
d %>% group_by(region) %>% summarize(NumPat=n(),prop=sum(treatment==1)/n())
What we do here is group by region and then pipe it to summarize by the number of patients in each group, and then calculate the proportion of those patients that received treatment 1.
Upvotes: 4
Reputation: 13139
You could also use data.table:
library(data.table)
setDT(d)[,.(.N,prop=sum(treatment==2)/.N),
by=region]
region N prop
1: A 200 0.5
2: B 200 0.5
3: C 200 0.5
4: D 200 0.5
5: E 200 0.5
Upvotes: 2
Reputation: 1143
If I understand the question correctly, this can be very easily (and fast!) done with table
and prop.table
:
prop.table(table(d$treatment, d$region))
This gives you the percentages of each cell. If you want to get row- or column-wise percentages, you want to make use of the margin
parameter in prop.table
:
prop.table(table(d$treatment, d$region), margin = 2) # column-wise
prop.table(table(d$treatment, d$region), margin = 1) # row-wise
Upvotes: 1