Reputation: 522
When I visualized my data, it had a series of periodicities in row. But Randomforest imputing and PCA imputing create outlier.
Main problem :
So i want to average on both sides in row. When the NA
value of the nth column occurs, the average of n-1
column and n+1
column impute corresponding row.
Sub problem
But first column and last column doesn't have n-1
or n+1
, therefore i will take n+1 or n-1.(don't worry, deviation of row is very small.)
It occur na value in row continuosly. It also take n-1
or n+1
.
EX:
tr <- structure(list(A_1 = c(NA,2,3,4,5), A_2 =c(4,5,6,NA,8), A_3 = c(7,9,NA,10,11),
A_4 = c(10,12,NA,13,NA), A_5 =c(12,NA,14,15,16), A_6 = c(13,15,15,16,17)),
row.names = c(NA, -5L),class = "data.frame")
> tr
A_1 A_2 A_3 A_4 A_5 A_6
1 NA 4 7 10 12 13
2 2 5 9 12 NA 15
3 3 6 NA NA 14 15
4 4 NA 10 13 15 16
5 5 8 11 NA 16 17
Desired output
> tr
A_1 A_2 A_3 A_4 A_5 A_6
1 4 4 7 10 12 13
2 2 5 9 12 13.5 15
3 3 6 6 14 14 15
4 4 7 10 13 15 16
5 5 8 11 13.5 16 17
Upvotes: 3
Views: 97
Reputation: 51592
One way via dplyr
is to convert to long format, take the lag()
and lead()
of your value column, compute the row means, replace NA and convert back to wide. i.e.
library(dplyr)
library(tidyr)
tr %>%
pivot_longer(everything()) %>%
mutate(n1 = lag(value), n2 = lead(value)) %>%
mutate(res = rowMeans(select(., c(n1, n2)), na.rm = TRUE),
value = replace(value, is.na(value), res[is.na(value)])) %>%
select(name, value) %>%
pivot_wider(names_from = name, values_from = value) %>%
unnest()
which gives,
A_1 A_2 A_3 A_4 A_5 A_6
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 4 4 7 10 12 13
2 2 5 9 12 13.5 15
3 3 6 6 14 14 15
4 4 7 10 13 15 16
5 5 8 11 13.5 16 17
Upvotes: 2
Reputation: 35624
A base R solution:
t(apply(tr, 1, function(x){
ifelse(!is.na(x), x,
rowMeans(cbind(c(NA, x[-length(x)]), c(x[-1], NA)), na.rm = T))
}))
# A_1 A_2 A_3 A_4 A_5 A_6
# [1,] 4 4 7 10.0 12.0 13
# [2,] 2 5 9 12.0 13.5 15
# [3,] 3 6 6 14.0 14.0 15
# [4,] 4 7 10 13.0 15.0 16
# [5,] 5 8 11 13.5 16.0 17
Upvotes: 1
Reputation: 33613
Using data.table
:
# Define helper function
foo <- function(x) sapply(transpose(shift(x, -1:1)), mean, na.rm = TRUE)
setDT(tr)[, melt(.SD, measure.vars = patterns("^A"))
][, value := fcoalesce(value, foo(value)), by = rowid(variable)
][, dcast(.SD, rowid(variable) ~ variable)
][, !"variable"]
A_1 A_2 A_3 A_4 A_5 A_6
1: 4 4 7 10.0 12.0 13
2: 2 5 9 12.0 13.5 15
3: 3 6 6 14.0 14.0 15
4: 4 7 10 13.0 15.0 16
5: 5 8 11 13.5 16.0 17
Or avoid the melting/dcasting with apply()
from base R:
tr[] <- t(apply(tr, 1, function(x) fcoalesce(x, foo(x))))
A_1 A_2 A_3 A_4 A_5 A_6
1 4 4 7 10.0 12.0 13
2 2 5 9 12.0 13.5 15
3 3 6 6 14.0 14.0 15
4 4 7 10 13.0 15.0 16
5 5 8 11 13.5 16.0 17
Upvotes: 1