Reputation: 1203
I have the following data structure:
library(dplyr)
test_data <- data.frame(some_dimension = c(rep("first",6),rep("second",6)),
first_col = c(rep(NA,3),rep(1,3),rep(NA,3),rep(0,3)),
second_col = c(rep(NA,3),rep(0,3),rep(NA,3),rep(1,3)),
third_col = c(rep(NA,3),rep(1,3),rep(NA,3),rep(1,3)))
some_dimension first_col second_col third_col
1 first NA NA NA
2 first NA NA NA
3 first NA NA NA
4 first 1 0 1
5 first 1 0 1
6 first 1 0 1
7 second NA NA NA
8 second NA NA NA
9 second NA NA NA
10 second 0 1 1
11 second 0 1 1
12 second 0 1 1
I would like to get the following data structure:
expexted_data <- data.frame(some_dimension = c(rep("first",6),rep("second",6)),
first_col = c(rep(0,3),rep(1,3),rep(1,3),rep(0,3)),
second_col = c(rep(1,3),rep(0,3),rep(0,3),rep(1,3)),
third_col = c(rep(0,3),rep(1,3),rep(0,3),rep(1,3)))
some_dimension first_col second_col third_col
1 first 0 1 0
2 first 0 1 0
3 first 0 1 0
4 first 1 0 1
5 first 1 0 1
6 first 1 0 1
7 second 1 0 0
8 second 1 0 0
9 second 1 0 0
10 second 0 1 1
11 second 0 1 1
12 second 0 1 1
That is I would like to fill the missing value with the oposite of the first non-missing value (grouped by some_dimension), where the values ranges in (0,1).
What I have tried last was the following. Its basically finding all non missings and take the smallest index. However I have some difficulties of applying that function properly:
my_fun <- function(x){
all_non_missings <- which(!is.na(x))
first_non_missing <- min(all_non_missings)
if(.data[first_non_missing] == 1){
is.na(x) <- rep(0, length.out = length(x))
} else {
is.na(x) <- rep(1, length.out = length(x))
}
}
test_data %>% group_by(some_dimension) %>% mutate_if(is.numeric, funs(new = my_fun(.)))
Where I get always some errors like:
Error in mutate_impl(.data, dots): Evaluation error: (list) object cannot be coerced to type 'double'.
Traceback:
for instance
Upvotes: 3
Views: 294
Reputation: 11480
data.table
setDT(test_data)[, lapply(.SD, function(x){x[is.na(x)]<-(1 - as.integer(mean(x, na.rm = T)));x}) , by = some_dimension][]
# some_dimension first_col second_col third_col
# 1: first 0 1 0
# 2: first 0 1 0
# 3: first 0 1 0
# 4: first 1 0 1
# 5: first 1 0 1
# 6: first 1 0 1
# 7: second 1 0 0
# 8: second 1 0 0
# 9: second 1 0 0
#10: second 0 1 1
#11: second 0 1 1
#12: second 0 1 1
Upvotes: 1
Reputation: 2210
Try the na.locf
function from the "zoo" package:
library(zoo)
test_data %>%
group_by(some_dimension) %>%
mutate_if(is.numeric,funs(ifelse(is.na(.),1-na.locf(.,fromLast=TRUE),.)))
# some_dimension first_col second_col third_col
#1 first 0 1 0
#2 first 0 1 0
#3 first 0 1 0
#4 first 1 0 1
#5 first 1 0 1
#6 first 1 0 1
#7 second 1 0 0
#8 second 1 0 0
#9 second 1 0 0
#10 second 0 1 1
#11 second 0 1 1
#12 second 0 1 1
Or shorter :
test_data %>%
group_by(some_dimension) %>%
mutate_if(is.numeric,funs(coalesce(.,1-na.locf(.,fromLast=TRUE))))
Upvotes: 2
Reputation: 16121
Here's a solution for the example you posted:
test_data <- data.frame(some_dimension = c(rep("first",6),rep("second",6)),
first_col = c(rep(NA,3),rep(1,3),rep(NA,3),rep(0,3)),
second_col = c(rep(NA,3),rep(0,3),rep(NA,3),rep(1,3)),
third_col = c(rep(NA,3),rep(1,3),rep(NA,3),rep(1,3)))
library(dplyr)
test_data %>%
group_by(some_dimension) %>%
mutate_all(~ifelse(is.na(.), 1-unique(.[!is.na(.)]), .)) %>%
ungroup()
# # A tibble: 12 x 4
# some_dimension first_col second_col third_col
# <fct> <dbl> <dbl> <dbl>
# 1 first 0 1 0
# 2 first 0 1 0
# 3 first 0 1 0
# 4 first 1 0 1
# 5 first 1 0 1
# 6 first 1 0 1
# 7 second 1 0 0
# 8 second 1 0 0
# 9 second 1 0 0
#10 second 0 1 1
#11 second 0 1 1
#12 second 0 1 1
Upvotes: 1