Reputation: 982
I have a table of employment by city and industry
df <- read.table(text="city industry emp
Washington Auto 2
Washington Aero 2
Boston Auto 4
Boston Aero 2", header = TRUE)
I want to calculate a Relative Diversity Index by city, i.e. the sum for each city, over all industries, of the absolute value of the difference between each industry’s share in local employment and its share in national employment. The math looks like this: RDIc = 1/∑i|Sci-Si| (small letters are subscripts: c for city and i for industry; RDI is the index; S means share).
Using the above data, I should get:
city rdi
Washington 5
Boston 7.5
Because:
Of course, this is mock data and I have 100s of cities and industries. I haven't been able to do this in R, even in multiple steps, short of splitting the df by city and then reassembling it, which seems very clunky.
Upvotes: 1
Views: 63
Reputation: 3502
This answer is based on the nice answer by @Gregor. I have just tried to reduce the steps so you don't need to create another data.frame and then join it
df1 <- df %>%
mutate(ind_total = sum(emp)) %>%
group_by(industry) %>%
mutate(si = sum(emp) / first(ind_total)) %>%
group_by(city) %>%
mutate(sci = emp / sum(emp)) %>%
summarise(rdi = 1 / sum(abs(sci - si)))
> df1
# A tibble: 2 × 2
city rdi
<fctr> <dbl>
1 Boston 7.5
2 Washington 5.0
Upvotes: 3
Reputation: 145825
Lots of little steps, but this works
library(dplyr)
natl = df %>%
mutate(ind_total = sum(emp)) %>%
group_by(industry) %>%
summarize(si = sum(emp) / first(ind_total)) %>%
select(industry, si)
result = df %>%
group_by(city) %>%
mutate(sci = emp / sum(emp)) %>%
inner_join(natl) %>%
group_by(city) %>%
summarize(rdi = 1 / sum(abs(sci - si)))
result
# # A tibble: 2 × 2
# city rdi
# <fctr> <dbl>
# 1 Boston 7.5
# 2 Washington 5.0
Upvotes: 2