Reputation: 53
I want to select non-NA values by id and by year for both variables v1 and v2 (in my actual dataset there are more than two variables like v1 and v2), unless there are only NAs.
I also want to set it so that if values by id and year conflict, I choose the last row.
> id<- c(1, 1, 1, 2, 2, 2, 2, 3, 3, 4, 4, 5,5)
> year<-c(2014, 2014, 2019, 2014, 2014, 2014, 2020, 2018, 2018, 2017, 2018, 2019, 2019)
> v1<- c("a", NA, NA, NA, "b", "a", "b", NA, NA, "c", NA, "a", NA)
> v2<- c(NA, "x", "y", "z", "z","z","z", "x", NA, NA, "x", "x", "x")
> df <- as.data.frame(cbind(id, year, v1, v2))
> df
id year v1 v2
1 1 2014 a <NA>
2 1 2014 <NA> x
3 1 2019 <NA> y
4 2 2014 <NA> z
5 2 2014 b z
6 2 2014 a z
7 2 2020 b z
8 3 2018 <NA> x
9 3 2018 <NA> <NA>
10 4 2017 c <NA>
11 4 2018 <NA> x
12 5 2019 a x
13 5 2019 <NA> x
Here is what I want my result to look like:
id year v1 v2
1 1 2014 a x
3 1 2019 <NA> y
6 2 2014 a z
7 2 2020 b z
8 3 2018 <NA> x
10 4 2017 c <NA>
11 4 2018 <NA> x
12 5 2019 a x
I was trying to use dplyr with group_by, but having issues getting it to run. Thanks!
Upvotes: 2
Views: 1688
Reputation: 30474
After grouping, you can try summarizing and taking the last
value after removing NA
.
library(dplyr)
df %>%
group_by(id, year) %>%
summarise(v1 = last(na.omit(v1)),
v2 = last(na.omit(v2)))
To use summarise
with all columns (except those used in grouping), you can do:
df %>%
group_by(id, year) %>%
summarise_all(~ last(na.omit(.)))
Output
id year v1 v2
<chr> <chr> <chr> <chr>
1 1 2014 a x
2 1 2019 NA y
3 2 2014 a z
4 2 2020 b z
5 3 2018 NA x
6 4 2017 c NA
7 4 2018 NA x
8 5 2019 a x
Upvotes: 2