Omry Atia
Omry Atia

Reputation: 2443

In R, apply a complicated function (not base) on several columns by group (dplyr)

I have a data frame, df, on which I would like to run a the function kepdf (from the package pdfCluster which calculates multivariate density). The point is this is not a simple base function like head, mean and the likes.

My data frame looks like this:

> head(df)
# A tibble: 6 x 4
      A     B     C Group
    <dbl> <dbl> <dbl> <dbl>
      2     1    39     1
      2     2    66     1
      2     2    36     1
      1     1    56     1
      1     1    37     1
      1     1    45     1

Now, I would like to calculate the density of columns A, B, and C for each Group separately (the variable Group just indicates the group the observation belongs to and should not enter the density calculation). I naively tried the following:

df %>% group_by(Group) %>% select(1:3) %>% do(kepdf(.))

and got the following error:

Adding missing grouping variables: `Group`
Error in kepdf(.) : NA/NaN/Inf in foreign function call (arg 2)

Now, there are no missing values in the data, so I'm confused. Also, I don't want to add the grouping variable Group because then the algorithm will add it to the density calculation, which I don't want it to do.

Any thoughts?

Upvotes: 0

Views: 167

Answers (1)

CPak
CPak

Reputation: 13581

Your issue is that you're grouping your data.frame by Group then trying to discard the grouping column before performing kepdf(...). When you call do(...), it adds back the grouping column necessarily.

Try instead

library(purrr)
df %>% split(.$Group) %>% map(., ~select(.x, 1:3)) %>% map(., ~kepdf(.x))

You can always combine the last two map(...) into a single function

myfun <- function(df) {
    require(pdfCluster)
    data <- select(df, 1:3)
    kepdf(data)
}

df %>% split(.$Group) %>% map(., ~myfun(.x))

Upvotes: 1

Related Questions