Reputation: 2065
I have an Excel sheet containing employment numbers of each industry of each county in the US.
It looks like this:
County Industry Employees
a 1 49
a 2 1
b 1 4
b 2 19
...
I want to calculate the Herfindahl-Hirschman index (HHI) of employment in each county. I'm using R. Given some numbers, calculating the HHI is easy:
hhi <- function(x) {
# calculate sum
total <- sum(x)
# calculate share
share <- x*100/total
# add
return(sum(share^2))
}
So, for example, county 1 has a HHI of 9608 (= 98^2 + 2^2) and county 2 has a HHI of 7127.
But how can I create a new column with the HHI of that county?
Upvotes: 1
Views: 1473
Reputation: 886938
We can use ave
from base R
(no packages used)
df1$HHI <- with(df1, ave(Employees, County, FUN = hhi))
Upvotes: 3
Reputation: 214927
You can use dplyr
:
library(dplyr)
df %>% group_by(County) %>% mutate(HHI = sum((Employees/sum(Employees) * 100)^2))
# Source: local data frame [4 x 4]
# Groups: County [2]
# County Industry Employees HHI
# <fctr> <int> <int> <dbl>
# 1 a 1 50 9615.532
# 2 a 2 1 9615.532
# 3 b 1 4 7126.654
# 4 b 2 19 7126.654
Or equivalently, use data.table
:
setDT(df)[, HHI := sum((Employees/sum(Employees) * 100)^2), County][]
With your own customized function hhi
, since all the functions it calls are vectorized, you can directly use it with mutate
:
df %>% group_by(County) %>% mutate(HHI = hhi(Employees))
or:
setDT(df)[, HHI := hhi(Employees), County][]
Upvotes: 6