Reputation: 1644
Based on the last
function in dplyr
package, if you want to take the last element in a vector, excluding NA
s, you can just introduce the na.omit
.
library(dplyr)
x <- c(1:10,NA)
last(x)
# [1] NA
last(na.omit(x))
# [1] 10
I would like to impute the last element for var1
for each id
. The following is an example of the dataframe used.
id<-rep(c(1,2,3),c(3,2,2))
var1<-c(5,1,4,2,NA,NA,NA)
df<-data.frame(id,var1)
df
# id var1
# 1 1 5
# 2 1 1
# 3 1 4
# 4 2 2
# 5 2 NA
# 6 3 NA
# 7 3 NA
Notice that id=1
contains only numeric for var1
, id=2
contains one numeric and one NA
, while id=3
contains only NA
s and no numeric.
I would like to obtain the following:
df
# id var1
# 1 1 4
# 2 1 4
# 3 1 4
# 4 2 2
# 5 2 2
# 6 3 NA
# 7 3 NA
Here is what I did to achieve what I wanted, but I got an error.
mutate(var1=ifelse(length(na.omit(var1))==0,NA,last(na.omit(var1))))
# Error: Unsupported vector type language
EDIT1: Based on the comments, the above code works well for dplyr 0.4.3, and apparently not for dplyr 0.5.0 (in my case). Additionally, I want to impute using the last element not the element with the maximum value. Thus, I have changed my data frame to make it more general.
EDIT2:I have considered a data frame that list all possible cases. Three cases, (1) all numeric, (2) numeric + NAs and (3) all NAs.
Upvotes: 2
Views: 1486
Reputation: 635
I had a similar issue. This worked for me:
df %>%
group_by(id) %>%
mutate(missing = is.na(var1)) %>%
mutate(var1 = ifelse(any(!missing), var1[!missing][length(var1[!missing])], NA))
Upvotes: 0
Reputation: 3496
I was asked to explain my solution, but I actually don't fully understand why OP's solution doesn't work. Initially I thought it was something due to the class of object returned by na.omit
> na.omit(var1)
[1] 1 2 3 4
attr(,"na.action")
[1] 5
attr(,"class")
[1] "omit"
But then I noticed that nth
(and I think last
is just a wrapper for it) works fine:
df %>%
group_by(id) %>%
mutate(var1=nth(na.omit(var1),-1L))
An alternative, is to use tail
rather then last
df %>%
group_by(id) %>%
mutate(var1=tail(na.omit(var1),1))
Or to create a new function, as I initially did:
aa <- function(x) last(na.omit(x))
df %>% group_by(id) %>% mutate(var1=aa(var1))
I was just curious about any differences in performance, so I checked them out but I would say they are equivalent
Unit: microseconds
expr min lq mean median uq max neval
mutate(var1 = nth(na.omit(var1), -1L)) 795.270 830.4880 1022.196 897.6375 1026.795 4437.483 1000
mutate(var1 = tail(na.omit(var1))) 791.035 825.6165 1011.288 892.6270 1037.463 3406.842 1000
mutate(var1 = aa(var1)) 788.085 825.5180 1108.872 888.9945 1036.664 102915.926 1000
Upvotes: 1
Reputation: 2359
Using dplyr package, we can group by each id and take max values of each id and replace in var1
library(dplyr)
df <- df %>%
group_by(id) %>%
mutate(var1 = max(var1,na.rm=T))
df
id var1
<dbl> <int>
1 1 3
2 1 3
3 1 3
4 2 4
5 2 4
Upvotes: 0