Reputation: 3768
Still trying to get my hands on tidyr
packages. If one has a data set with redundant rows like this:
require(dplyr)
require(tidyr)
data <-
data.frame(
v1 = c("ID1", NA, "ID2", NA),
v2 = c("x", NA, "xx", NA),
v3 = c(NA, "z", NA, "zz"),
v4 = c(22, 22, 6, 6),
v5 = c(5, 5, 9, 9)) %>%
tbl_df()
> data
Source: local data frame [4 x 5]
v1 v2 v3 v4 v5
1 ID1 x NA 22 5
2 NA NA z 22 5
3 ID2 xx NA 6 9
4 NA NA zz 6 9
Since the id variables v1
- v3
is split into redundant rows with many NAs (and therefore the two measurements are also repeated) one would like to get something like this below:
v1 v2 v3 v4 v5
1 ID1 x z 22 5
2 ID2 xx zz 6 9
What would be a general way of getting this using tidyr
? I feel it could be done using gather()
but how ?
Upvotes: 0
Views: 329
Reputation: 887048
You may also do
library(dplyr)
data %>%
mutate(v3=v3[!is.na(v3)][cumsum(is.na(v3))]) %>%
na.omit()
# v1 v2 v3 v4 v5
#1 ID1 x z 22 5
#2 ID2 xx zz 6 9
Or based on the data showed
data %>%
mutate(v3=lead(as.character(v3))) %>%
na.omit()
Upvotes: 2
Reputation: 23574
One way would be like this. Using na.locf()
from the zoo
package, I replaced NAs in v1
. Then, I grouped the data using the variable. I employed na.locf()
one more time to take care of v3
. Finally, I removed rows with NAs in v2
.
library(zoo)
library(dplyr)
mutate(data, v1 = na.locf(v1)) %>%
group_by(v1) %>%
mutate(v3 = na.locf(v3, fromLast = TRUE)) %>%
filter(complete.cases(v2)) %>%
ungroup
# v1 v2 v3 v4 v5
#1 ID1 x z 22 5
#2 ID2 xx zz 6 9
Upvotes: 2