Combine aggregates at different levels of detail in R

Question

I have a table of employment by city and industry

df <- read.table(text="city industry emp
Washington Auto 2
Washington Aero 2
Boston Auto 4
Boston Aero 2", header = TRUE)

I want to calculate a Relative Diversity Index by city, i.e. the sum for each city, over all industries, of the absolute value of the difference between each industry’s share in local employment and its share in national employment. The math looks like this: RDIc = 1/∑i|Sci-Si| (small letters are subscripts: c for city and i for industry; RDI is the index; S means share).

Using the above data, I should get:

city       rdi
Washington   5
Boston     7.5

Because:

RDI Washington = 1/(abs(2/4-6/10)+abs(2/4-4/10)) = 5
RDI Boston = 1/(abs(4/6-6/10)+abs(2/6-4/10)) = 7.5

Of course, this is mock data and I have 100s of cities and industries. I haven't been able to do this in R, even in multiple steps, short of splitting the df by city and then reassembling it, which seems very clunky.

Gregor Thomas · Accepted Answer

Lots of little steps, but this works

library(dplyr)
natl = df %>%
    mutate(ind_total = sum(emp)) %>%
    group_by(industry) %>%
    summarize(si = sum(emp) / first(ind_total)) %>%
    select(industry, si)

result = df %>%
    group_by(city) %>%
    mutate(sci = emp / sum(emp)) %>%
    inner_join(natl) %>%
    group_by(city) %>%
    summarize(rdi = 1 / sum(abs(sci - si)))

result
# # A tibble: 2 × 2
#         city   rdi
#        
# 1     Boston   7.5
# 2 Washington   5.0

Combine aggregates at different levels of detail in R

Answers (2)

Related Questions