wwl
wwl

Reputation: 2065

Creating a new column based on function

I have an Excel sheet containing employment numbers of each industry of each county in the US.

It looks like this:

County   Industry  Employees
a        1         49
a        2         1
b        1         4
b        2         19
...

I want to calculate the Herfindahl-Hirschman index (HHI) of employment in each county. I'm using R. Given some numbers, calculating the HHI is easy:

hhi <- function(x) {
  # calculate sum
  total <- sum(x)
  
  # calculate share
  share <- x*100/total
  
  # add
  return(sum(share^2))
  
}

So, for example, county 1 has a HHI of 9608 (= 98^2 + 2^2) and county 2 has a HHI of 7127.

But how can I create a new column with the HHI of that county?

Upvotes: 1

Views: 1473

Answers (2)

akrun
akrun

Reputation: 886938

We can use ave from base R (no packages used)

df1$HHI <- with(df1, ave(Employees, County, FUN = hhi))

Upvotes: 3

akuiper
akuiper

Reputation: 214927

You can use dplyr:

library(dplyr)
df %>% group_by(County) %>% mutate(HHI = sum((Employees/sum(Employees) * 100)^2))

# Source: local data frame [4 x 4]
# Groups: County [2]

#   County Industry Employees      HHI
#   <fctr>    <int>     <int>    <dbl>
# 1      a        1        50 9615.532
# 2      a        2         1 9615.532
# 3      b        1         4 7126.654
# 4      b        2        19 7126.654

Or equivalently, use data.table:

setDT(df)[, HHI := sum((Employees/sum(Employees) * 100)^2), County][]

With your own customized function hhi, since all the functions it calls are vectorized, you can directly use it with mutate:

df %>% group_by(County) %>% mutate(HHI = hhi(Employees))

or:

setDT(df)[, HHI := hhi(Employees), County][]

Upvotes: 6

Related Questions