Reputation: 193
I want to know how to replace NA
values in a specific column by groups with an interpolation. Some of my groups only have one non-NA I would like to fill those groups with the one non-NA
If I have a dataframe like:
Group Value
ALB NA
ALB 10
ALB NA
ALB 12
ARE NA
ARE NA
ARE 2
ARE NA
ARE NA
ARG 4
ARG NA
ARG 6
I want to create a new column, so my dataframe would look like the following:
Group Value New Column
ALB NA 9
ALB 10 10
ALB NA 11
ALB 12 12
ARE NA 2
ARE NA 2
ARE 2 2
ARE NA 2
ARE NA 2
ARG 4 4
ARG NA 5
ARG 6 6
Upvotes: 1
Views: 1458
Reputation: 269654
This one-liner will interpolate the NAs by group and for NAs on the ends of a group will extend the nearest non-NA to it giving it the same value, i.e. it does linear interpolation and constant extrapolation, which is not exactly what was asked for but may be close enough. Note that this also implies that if there is only one non-NA then all NAs are set to it.
library(zoo)
transform(DF, newCol = ave(Value, Group, FUN = function(x) na.approx(x, rule = 2)))
giving:
Group Value newCol
1 ALB NA 10
2 ALB 10 10
3 ALB NA 11
4 ALB 12 12
5 ARE NA 2
6 ARE NA 2
7 ARE 2 2
8 ARE NA 2
9 ARE NA 2
10 ARG 4 4
11 ARG NA 5
12 ARG 6 6
DF <- structure(list(Group = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L), Value = c(NA,
10L, NA, 12L, 4L, NA, NA, 7L)), class = "data.frame", row.names = c(NA,
-8L))
DF <-
structure(list(Group = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 2L,
2L, 2L, 3L, 3L, 3L), .Label = c("ALB", "ARE", "ARG"), class = "factor"),
Value = c(NA, 10L, NA, 12L, NA, NA, 2L, NA, NA, 4L, NA, 6L
)), class = "data.frame", row.names = c(NA, -12L))
Upvotes: 3
Reputation: 20095
One can consider using Hmisc::approxExtrap
function to perform both interpolate
and extrapolate
missing values. You have to provide both x
and y
values without NA
which will be used as reference
to interpolate/extrapolate
missing values. The function returns for required set/rows (as passed with argument xout
) after interpolating/extrapolating
.
library(Hmisc)
library(dplyr)
df %>% group_by(Group) %>%
mutate(newVal =
approxExtrap(which(!is.na(Value)), Value[!is.na(Value)],xout = 1:n(), rule=2)$y) %>%
as.data.frame()
# Group Value newVal
# 1 1 NA 9
# 2 1 10 10
# 3 1 NA 11
# 4 1 12 12
# 5 2 4 4
# 6 2 NA 5
# 7 2 NA 6
# 8 2 7 7
Data:
df <- read.table(text =
"Group Value
1 NA
1 10
1 NA
1 12
2 4
2 NA
2 NA
2 7",
header = TRUE)
Upvotes: 0
Reputation: 84529
df <- data.frame(
group = rep(1:2, each = 4),
value = c(NA, 10, NA, 12, 4, NA, NA, 7))
complete <- function(x){
i <- which.min(is.na(x))
y <- seq_along(x) + x[i] - i
return(y)
}
newdf <- do.call(rbind,
lapply(split(df, df$group),
function(dat){
transform(dat, newvalue=complete(value))
}))
rownames(newdf) <- NULL
This gives:
> newdf
group value newvalue
1 1 NA 9
2 1 10 10
3 1 NA 11
4 1 12 12
5 2 4 4
6 2 NA 5
7 2 NA 6
8 2 7 7
Upvotes: 1
Reputation: 523
Check out na.approx
in zoo
package:
https://www.rdocumentation.org/packages/zoo/versions/1.8-2/topics/na.approx
You can use split
on your data and then use apply
.
> library(zoo)
> df<- data.frame(
group = rep(1:2, each = 4),
value = c(NA, 10, NA, 12, 4, NA, NA, 7))
> df
group value
1 1 NA
2 1 10
3 1 NA
4 1 12
5 2 4
6 2 NA
7 2 NA
8 2 7
> dfl <- split(df$value, df$group)
> dfl
$`1`
[1] NA 10 NA 12
$`2`
[1] 4 NA NA 7
> lapply(dfl, na.approx)
$`1`
[1] 10 11 12
$`2`
[1] 4 5 6 7
However, this will work if you have upper and lower limit in each group. Otherwise you will have issue like in the first group where it couldn't determine what to replace NA with.
Upvotes: 0