learning_code
learning_code

Reputation: 1

Is there an R function/approach that enables you to apply a custom function to each group of a grouped data frame?

In R, I am attempting to apply a custom function to each individual group of a dataframe (that is grouped by a particular column) The function itself is designed to take in a subset (one group) of the whole data frame and return a modified data frame. Ideally, I would like the final output to be a single data frame with the same combined groups, but each group is modified by the function before being binded back into the whole data frame.

For example:

employee <- c('John Doe','Peter Gynn','Jolie Hope'...)
month <- c('Jan','Feb','Mar'...)
monthlysalary <- c(21000, 23400, 26800...)
startdate <- as.Date(c('2010-11-1','2008-3-25','2007-3-14')...)
 employ.data <- data.frame(employee, month, salary, startdate, stringsAsFactors=FALSE)

I would like to apply some custom function:

func = function(df_subset) {
##functions mutates monthlysalary of dataframe  - cannot include exact code for privacy reasons##
}

to the original data frame grouped by the column employee (employ.data %>% group_by(employee)) So, the function only changes the column monthly salary of each employee in isolation. The function actually considers row indexes for each group and since the groups are of unequal length, I have to apply the function to each group in isolation.

I have struggled to successfully use any of the dyplyr apply functions.

Any help would be greatly appreciated. Thank you,

Upvotes: 0

Views: 70

Answers (2)

ThomasIsCoding
ThomasIsCoding

Reputation: 101373

With base R, you can use the following code to update salary by groups of employee:

employ.data <- within(employ.data, salary <- ave(salary,employee,FUN = func))

Example with Dummy Data

employee <- c('John Doe','Peter Gynn','Jolie Hope','John Doe')
month <- c('Jan','Feb','Mar','Feb')
salary <- c(21000, 23400, 26800,22000)
startdate <- as.Date(c('2010-11-1','2008-3-25','2007-3-14','2010-11-1'))
employ.data <- data.frame(employee, month, salary, startdate, stringsAsFactors=FALSE)

# here is a dummy function that depending on the sized of subset
func <- function(x) x + 100*length(x)

then you will get

> employ.data
    employee month salary  startdate
1   John Doe   Jan  21200 2010-11-01
2 Peter Gynn   Feb  23500 2008-03-25
3 Jolie Hope   Mar  26900 2007-03-14
4   John Doe   Feb  22200 2010-11-01

Upvotes: 0

Ronak Shah
Ronak Shah

Reputation: 388982

If your function expects a subset of dataframe we can pass do it with split-apply-combine approach

do.call(rbind, lapply(split(employ.data, employ.data$employee), func))

In tidyverse that could be applied with

library(dplyr)
library(purrr)

df %>% group_split(employee) %>% map_df(func)

Upvotes: 0

Related Questions