Tom
Tom

Reputation: 61

Using tapply on two columns instead of one

I would like to calculate the gini coefficient of several plots with R unsing the gini() function from the package reldist. I have a data frame from which I need to use two columns as input to the gini function.

>  head(merged[,c(1,17,29)])
  idp c13     w
1  19 126 14.14
2  19 146 14.14
3  19  76 39.29
4  19  74 39.29
5  19  86 39.29
6  19  93 39.29

The gini function uses the first elements for calculation (c13 here) and the second elements are the weights (w here) corresponding to each element from c13.

So I need to use the column c13 and w like this:

gini(merged$c13,merged$w)
[1] 0.2959369

The thing is I want to do this for each plot (idp). I have 4 thousands different values of idp with dozens of values of the two other columns for each.

I thought I could do this using the function tapply(). But I can't put two colums in the function using tapply.

tapply(list(merged$c13,merged$w), merged$idp, gini)

As you know this does not work. So what I would love to get as a result is a data frame like this:

 idp  Gini 
1  19 0.12 
2  21 0.45
3  35 0.65
4  65 0.23

Do you have any idea of how to do this?? Maybe the plyr package? Thank you for your help!

Upvotes: 0

Views: 681

Answers (1)

Didzis Elferts
Didzis Elferts

Reputation: 98579

You can use function ddply() from library plyr() to calculate coefficient for each level (changed in example data frame some idp values to 21).

library(plyr)
library(reldist)
ddply(merged,.(idp),summarize, Gini=gini(c13,w))

  idp       Gini
1  19 0.15307402
2  21 0.05006588

Upvotes: 1

Related Questions