Reputation: 13
I searched the forum for a problem similar to mine, but could not find something matching exactly that problem.
I have an R dataframe with a grouping column and columns containing values, such as doubles and date. What I want to do is to write a function that groups the dataframe and create a new column that (1) if the value column contains only na, returns na or (2) if the value column contains at least one non-na, return, say, the maximum. I have attempted the following:
library(dplyr)
a <- c("A", "A", "B", "B", "C", "C")
b <- c(1,2,NA,NA,NA,6)
c <- as.Date(c("2021-01-01", "2021-01-02", NA,
NA, NA, "2021-01-06"))
df <- data.frame("Group" = a, "Value" = b, "Date" = c)
take_max <- function(data, group, value, new_col_name, fun) {
data %>% group_by({{ group }}) %>%
mutate({{ new_col_name }} := if_else(
all(is.na({{ value }})),
fun(NA),
max({{ value }}, na.rm = TRUE)
))
}
df %>% take_max(Group, Date, min_max, fun = as.Date)
df %>% take_max(Group, Value, min_max, fun = as.numeric)
It seems to work, but I get the following warning
Warnmeldungen:
1: Problem with `mutate()` input `new_col`.
i kein nicht-fehlendes Argument für max; gebe -Inf zurück
i Input `new_col` is `if_else(all(is.na(Value)), fun(NA), max(Value, na.rm = TRUE))`.
i The error occurred in group 2: Group = "B".
2: In max(~Value, na.rm = TRUE) :
kein nicht-fehlendes Argument für max; gebe -Inf zurück
My understanding of the problem is that in group B if_else
tests if max({{ value }}, na.rm = TRUE)
(which in this case is equivalent to max(c())
), would also be of the same type as fun(NA)
and therefore evaluates both options. I tried to replace if_else
with ifelse
, but then the Date type is not preserved.
Would anyone have an idea of how to handle that?
Upvotes: 1
Views: 468
Reputation: 436
Try this:
take_max <- function(data, group, value, new_col_name){
data %>%
group_by({{group}}) %>%
mutate({{new_col_name}} := if(all(is.na({{value}}))) NA else max({{value}}, na.rm = TRUE))
}
take_max(df, Group, Value, min_max)
take_max(df, Group, Date, min_max)
If you don't want multiple records per group, you can replace mutate
with summarise
.
Upvotes: 1