Reputation: 1
In R, I am attempting to apply a custom function to each individual group of a dataframe (that is grouped by a particular column) The function itself is designed to take in a subset (one group) of the whole data frame and return a modified data frame. Ideally, I would like the final output to be a single data frame with the same combined groups, but each group is modified by the function before being binded back into the whole data frame.
For example:
employee <- c('John Doe','Peter Gynn','Jolie Hope'...)
month <- c('Jan','Feb','Mar'...)
monthlysalary <- c(21000, 23400, 26800...)
startdate <- as.Date(c('2010-11-1','2008-3-25','2007-3-14')...)
employ.data <- data.frame(employee, month, salary, startdate, stringsAsFactors=FALSE)
I would like to apply some custom function:
func = function(df_subset) {
##functions mutates monthlysalary of dataframe - cannot include exact code for privacy reasons##
}
to the original data frame grouped by the column employee (employ.data %>% group_by(employee)
) So, the function only changes the column monthly salary of each employee in isolation. The function actually considers row indexes for each group and since the groups are of unequal length, I have to apply the function to each group in isolation.
I have struggled to successfully use any of the dyplyr apply functions.
Any help would be greatly appreciated. Thank you,
Upvotes: 0
Views: 70
Reputation: 101373
With base R
, you can use the following code to update salary
by groups of employee
:
employ.data <- within(employ.data, salary <- ave(salary,employee,FUN = func))
Example with Dummy Data
employee <- c('John Doe','Peter Gynn','Jolie Hope','John Doe')
month <- c('Jan','Feb','Mar','Feb')
salary <- c(21000, 23400, 26800,22000)
startdate <- as.Date(c('2010-11-1','2008-3-25','2007-3-14','2010-11-1'))
employ.data <- data.frame(employee, month, salary, startdate, stringsAsFactors=FALSE)
# here is a dummy function that depending on the sized of subset
func <- function(x) x + 100*length(x)
then you will get
> employ.data
employee month salary startdate
1 John Doe Jan 21200 2010-11-01
2 Peter Gynn Feb 23500 2008-03-25
3 Jolie Hope Mar 26900 2007-03-14
4 John Doe Feb 22200 2010-11-01
Upvotes: 0
Reputation: 388982
If your function expects a subset of dataframe we can pass do it with split-apply-combine
approach
do.call(rbind, lapply(split(employ.data, employ.data$employee), func))
In tidyverse
that could be applied with
library(dplyr)
library(purrr)
df %>% group_split(employee) %>% map_df(func)
Upvotes: 0