Reputation: 678
I have a data frame like this:
> head(df1)
iso year var1 var2 var3
1 XXX 2005 165 29 2151
2 XXX 2006 160 21 2139
3 XXX 2007 NA NA NA
4 XXX 2008 184 9 3640
5 XXX 2009 NA NA NA
6 YYY 2005 206 461 8049
I want to replace the NA
's of intermittent years based on the years around it and the NA
's in years at the beginning and end of the range by carrying backward and forward the outer most non-NA observation.
My code to do this for one column is:
df1 %>%
group_by(iso) %>%
mutate(var1 = na.approx(var1, na.rm = FALSE, rule = 1)) %>%
mutate(var1 = na.locf(var1, na.rm = FALSE)) %>%
mutate(var1 = na.locf(var1, na.rm = FALSE, fromLast = TRUE))
This works, so now I want to do this for all columns in one go (there are more than 3 and they are not numbered like in my example). This I pieced together from the answers to this question. I omitted the two calls to na.locf
.
columnnames <- c("var1, "var2", "var3")
df1 %>%
group_by(iso) %>%
mutate_at(.vars = vars(columnnames), .funs = funs(na.approx(., na.rm = FALSE, rule = 1)))
This throws me an error and a warning:
Error in approx(x[!na], y[!na], xout, ...) : need at least two non-NA values to interpolate In addition: Warning message: In xy.coords(x, y, setLab = FALSE) : NAs introduced by coercion
I think I understand the error, but I did not get it when I used the first piece of code on var1
. The warning I don't follow. How cal I apply my code to all columns in my data frame? I also tried putting evertything in a loop, looping over columnnames
but that didn't work either (and it it probably not the best way to go about this).
Upvotes: 5
Views: 1727
Reputation: 269491
Use na.approx
with method = "constant"
(same as na.locf
) and rule = 2
(means extend nearest value to leading and trailing NAs). If you want the NAs to be linearly interpolated instead remove the method="constant"
argument.
df1 %>%
group_by(iso) %>%
mutate_at(vars(-iso), funs(na.approx(., method = "constant", rule = 2))) %>%
ungroup
giving:
# A tibble: 6 x 5
iso year var1 var2 var3
<fct> <dbl> <dbl> <dbl> <dbl>
1 XXX 2005 165 29 2151
2 XXX 2006 160 21 2139
3 XXX 2007 160 21 2139
4 XXX 2008 184 9 3640
5 XXX 2009 184 9 3640
6 YYY 2005 206 461 8049
df1
in reproducible form is:
df1 <-
structure(list(iso = structure(c(1L, 1L, 1L, 1L, 1L, 2L), .Label = c("XXX",
"YYY"), class = "factor"), year = c(2005L, 2006L, 2007L, 2008L,
2009L, 2005L), var1 = c(165L, 160L, NA, 184L, NA, 206L), var2 = c(29L,
21L, NA, 9L, NA, 461L), var3 = c(2151L, 2139L, NA, 3640L, NA,
8049L)), class = "data.frame", row.names = c("1", "2", "3", "4",
"5", "6"))
Upvotes: 3
Reputation: 20085
You can re-write your code using mutate_at
so that conversion can be done in one go as:
library(dplyr)
library(zoo)
df %>%
group_by(iso) %>%
mutate_at(vars(starts_with("var")),
funs(na.locf(na.locf(na.approx(., na.rm = FALSE, rule = 1),na.rm=FALSE),
fromLast=TRUE)))
# # A tibble: 6 x 5
# # Groups: iso [2]
# iso year var1 var2 var3
# <chr> <int> <dbl> <dbl> <dbl>
# 1 XXX 2005 165 29.0 2151
# 2 XXX 2006 160 21.0 2139
# 3 XXX 2007 172 15.0 2890
# 4 XXX 2008 184 9.00 3640
# 5 XXX 2009 184 9.00 3640
# 6 YYY 2005 206 461 8049
#
Data:
df <- read.table(text=
"iso year var1 var2 var3
1 XXX 2005 165 29 2151
2 XXX 2006 160 21 2139
3 XXX 2007 NA NA NA
4 XXX 2008 184 9 3640
5 XXX 2009 NA NA NA
6 YYY 2005 206 461 8049",
header = TRUE, stringsAsFactors = FALSE)
Upvotes: 2
Reputation: 47300
Here is a base solution:
ave(df,df$iso, FUN =function(y){
if(nrow(y) > 1) y[3:5] <- lapply(y[3:5], function(x) approx(y$year,x,y$year,rule=2)$y)
y
})
# iso year var1 var2 var3
# 1 XXX 2005 165 29 2151.0
# 2 XXX 2006 160 21 2139.0
# 3 XXX 2007 172 15 2889.5
# 4 XXX 2008 184 9 3640.0
# 5 XXX 2009 184 9 3640.0
# 6 YYY 2005 206 461 8049.0
Upvotes: 2
Reputation: 39154
We can use mutate_at
. The key is to specify the right columns in the vars
argument, which uses the same rule as the select
function. Therefore, in this case, vars(starts_with("var"))
will also work.
library(dplyr)
library(zoo)
df1 %>%
group_by(iso) %>%
mutate_at(vars(-iso, -year), funs(na.approx(., na.rm = FALSE, rule = 1))) %>%
mutate_at(vars(-iso, -year), funs(na.locf(., na.rm = FALSE))) %>%
mutate_at(vars(-iso, -year), funs(na.locf(., na.rm = FALSE, fromLast = TRUE)))
# # A tibble: 6 x 5
# # Groups: iso [2]
# iso year var1 var2 var3
# <chr> <int> <dbl> <dbl> <dbl>
# 1 XXX 2005 165 29 2151
# 2 XXX 2006 160 21 2139
# 3 XXX 2007 172 15 2890.
# 4 XXX 2008 184 9 3640
# 5 XXX 2009 184 9 3640
# 6 YYY 2005 206 461 8049
DATA
df1 <- read.table(text = " iso year var1 var2 var3
1 XXX 2005 165 29 2151
2 XXX 2006 160 21 2139
3 XXX 2007 NA NA NA
4 XXX 2008 184 9 3640
5 XXX 2009 NA NA NA
6 YYY 2005 206 461 8049 ",
header = TRUE, stringsAsFactors = FALSE)
Upvotes: 2