Drew Steen
Drew Steen

Reputation: 16637

Use plyr to apply functions stored in lists

I'd like to use plyr to calculate multiple empirical cumulative distribution functions using ecdf(), and then apply those functions appropriately to entries in a data frame. For instance:

# Use the diamonds dataset in ggplot2
library(diamonds)
library(plyr)

# Calculate an ecdf for each combination of cut and color
all_ecdfs <- dlply(diamonds, c("cut", "color"), function(x) ecdf(x$carat))

# Make a dataset of specific diamonds, which I want to compare to the larger set
# My particular subset of diamonds
my_diamonds <- ddply(diamonds, c("cut", "color"), summarise, 
               my.carat=runif(n=1, min=0.5, max=1))

If I were to do this manually, it would look something like this:

# Use the ecdf for the first entry: cut=="Fair" and color=="D"
my_diamonds$percentile <- NA
my_diamonds$percentile[my_diamonds$cut=="Fair" & my_diamonds$color=="D"] <- 
            all_ecdfs[["Fair.D"]](my_diamonds$my.carat[my_diamonds$cut=="Fair" & my_diamonds$color=="D"])

Seems like there should be some way to use ldply or lapply to do this automatically, but I can't figure it out.

Upvotes: 1

Views: 61

Answers (1)

jeremycg
jeremycg

Reputation: 24965

Here's how I would do it using dplyr to make the ecdfs, and vectorizing to get the values for your data.

#get ecdfs
library(dplyr)
z <- diamonds %>% group_by(cut, color) %>%
                  summarise(x = list(ecdf(carat)))

Now you have a dataframe z with the functions in a list in column x.

Call the function on our data. We go by row, and get the matching cut and color, then call the function on carat:

z$x[z$cut == my_diamonds$cut & z$color == my_diamonds$color][[1]](my_diamonds$my.carat)

Upvotes: 1

Related Questions