Thirst for Knowledge
Thirst for Knowledge

Reputation: 1628

R - Aggregate denominator based on condition, for use in percentage calculation for all rows

I have data that look like this:

population <- c(101:110)
coverage  <- c(91:100)
area <- c("Cambridge", "Cambridge","Cambridge", "Cambridge","Cambridge", "Oxford", "Oxford","Oxford", "Oxford","Oxford")
all <- data.frame(population,coverage,area) 

I then want a neat piece of R code that calculates the percentage of the population within an area that has mobile coverage. I know it's something like this (but not this):

coverage <- population x (coverage/100) / (aggregate(population, by=area, FUN=sum))

How do I calculate the sum of the population by area, for use as the denominator in the percentage calculation for all rows? Normally I would use aggregate to get the population by area, and then merge it back to the dataframe to use as the denominator, but that's not very elegant at all. I want the data to end up looking like this:

population <- c(101:110)
coverage  <- c(91:100)
area <- c("Cambridge", "Cambridge","Cambridge", "Cambridge","Cambridge", "Oxford", "Oxford","Oxford", "Oxford","Oxford")
percentage <- c(18, 18, 18, 18, 18, 19, 19, 19, 19, 19)
all <- data.frame(population,coverage,area, percentage) 

Help would be much appreciated.

Upvotes: 0

Views: 187

Answers (3)

mpschramm
mpschramm

Reputation: 550

You can do this with dplyr:

all.summary <- all %>%
    group_by(area) %>%
    mutate(percentage = population/sum(population)*(coverage/100))
all.summary


   population coverage      area percentage
        <int>    <int>    <fctr>      <dbl>
1         101       91 Cambridge  0.1784660
2         102       92 Cambridge  0.1822136
3         103       93 Cambridge  0.1860000
4         104       94 Cambridge  0.1898252
5         105       95 Cambridge  0.1936893
6         106       96    Oxford  0.1884444
7         107       97    Oxford  0.1922037
8         108       98    Oxford  0.1960000
9         109       99    Oxford  0.1998333
10        110      100    Oxford  0.2037037

Upvotes: 0

Joy
Joy

Reputation: 769

I think you want dplyr summarise for this.

Does this achieve what you want?

library(dplyr) all %>% group_by(area) %>% summarise(coveragePct=sum(coverage)/sum(population))

Upvotes: 0

aichao
aichao

Reputation: 7445

You can use dplyr grouping the computation by area:

library(dplyr)
all %>% group_by(area) %>% mutate(percentage=population*(coverage/100)/sum(population))
##Source: local data frame [10 x 4]
##Groups: area [2]
##
##   population coverage      area percentage
##        <int>    <int>    <fctr>      <dbl>
##1         101       91 Cambridge  0.1784660
##2         102       92 Cambridge  0.1822136
##3         103       93 Cambridge  0.1860000
##4         104       94 Cambridge  0.1898252
##5         105       95 Cambridge  0.1936893
##6         106       96    Oxford  0.1884444
##7         107       97    Oxford  0.1922037
##8         108       98    Oxford  0.1960000
##9         109       99    Oxford  0.1998333
##10        110      100    Oxford  0.2037037

Upvotes: 0

Related Questions