Reputation: 191
I have a data frame in which several data sources are merged. This creates rows with the same id. Now I want to define which values from which row should be kept.
So far I have been using dplyr with group_by and summarize all to keep the first value if it is not NA.
Here's an example:
# function f for summarizing
f <- function(x) {
x <- na.omit(x)
if (length(x) > 0) first(x) else NA
}
# test data
test <- data.frame(id = c(1,2,1,2), value1 = c("a",NA,"b","c"), value2 = c(0:4))
id value1 value2
1 a 0
2 <NA> 1
1 b 2
2 c 3
The following result is obtained when merging
test <- test %>% group_by(id) %>% summarise_all(funs(f))
id value1 value2
1 a 0
2 c 1
Now the question: that NA (na.omit) be replaced already works, but how can I define that not the numerical value 0, but the value not equal to 0 is accepted. So the expected result looks like this:
id value1 value2
1 a 2
2 c 1
Upvotes: 2
Views: 173
Reputation: 8513
As a sidenote to the sidenote of @RicS, as of dplyr v1+
, summarise_all()
is deprecated (superseded). You should rather use across()
:
test %>%
group_by(id) %>%
summarise(across(.f=f))
Upvotes: 0
Reputation: 389155
You can write f
function as :
library(dplyr)
f <- function(x) x[!is.na(x) & x != 0][1]
test %>% group_by(id) %>% summarise(across(.fns = f))
# id value1 value2
# <dbl> <chr> <int>
#1 1 a 2
#2 2 c 1
Using [1]
would return NA
automatically if there are no non-zero or non-NA value in your data.
Upvotes: 1
Reputation: 9257
You can just modify your f
function by subsetting the vector where it is different from zero
f <- function(x) {
x <- na.omit(x)
x <- x[x != 0]
if (length(x) > 0) first(x) else NA
}
Sidenote: as of dplyr 0.8.0
, funs
is deprecated. You should a lambda, a list of functions or a list of lambdas. In this case I used a single lambda:
test %>%
group_by(id) %>%
summarise_all(~f(.))
# A tibble: 2 x 3
id value1 value2
<dbl> <chr> <int>
1 1 a 2
2 2 c 1
Upvotes: 1