Reputation: 3
I'm having some difficulties trying to calculate the gini coefficient using binned census data, and would really appreciate any help.
My data looks a little something like this (but with 14,000 observations of 13 variables).
location <- c('A','B','C', 'D', 'E', 'F')
no_income <- c(20, 1, 40, 79, 12, 2)
income1 <- c(13, 4, 56, 17, 9, 4)
income2 <- c(27, 39, 49, 12, 19, 0)
income3 <- c(0, 1, 4, 3, 27, 0)
df <- data.frame(location, no_income, income1, income2, income3)
So for each observation there is a location given, and then a series of columns indicating how many households in the area earn within the given income bracket (so for location A, 20 households earn $0, 13 earn income1, 27 income2, and 0 income3).
I've created an empty column to return the results to:
df$gini = 0
I've then created a numerical vector (x) containing the income amount I want to use for each income bin
x <- c(0, 300, 1000, 2000)
I've been trying to use the gini function within the reldist package, and have written the following for loop to cycle through each row of the data, apply the gini function and return the output to a new column.
for (i in 1:nrow(samp)){
w <- samp[i,2:5]
df$gini <- gini(x, w=rep(1, length=length(x)))
}
The problem is that the ouput returned is currently identical for each row, which is obviously not correct. I'm relatively new to this though, and not sure what I'm doing wrong...
Upvotes: 0
Views: 1387
Reputation: 1316
R vectorises operations, so there's often no need to write a loop; in this case you do because of how the function works. You also don't often need to initialise a container (sometimes you might, but rarely).
Here's a working example using apply to loop over the rows:
# setup
install.packages("reldist")
library(reldist)
# dummy data
df = data.frame(ID=letters,
Bin1=rpois(26, 3),
Bin2=rpois(26, 8),
Bin3=rpois(26, 1))
inc = c(0, 300, 1000)
# new column with gini
df$gini = apply(df[, 2:4], 1, function(i){
gini(inc, i)
})
Worth noting that gini()
defaults the weights
argument to =rep(1, length=length(x))
, so if that's what you want you don't need to define it.
Edit: I've added inclusion of weights, based on what I read in the manual: https://cran.r-project.org/web/packages/reldist/reldist.pdf.
Upvotes: 0