HNSKD
HNSKD

Reputation: 1644

Using mutate and last functions with NAs

Based on the last function in dplyr package, if you want to take the last element in a vector, excluding NAs, you can just introduce the na.omit.

library(dplyr)
x <- c(1:10,NA)
last(x)
# [1] NA
last(na.omit(x))
# [1] 10

I would like to impute the last element for var1 for each id. The following is an example of the dataframe used.

id<-rep(c(1,2,3),c(3,2,2))
var1<-c(5,1,4,2,NA,NA,NA)
df<-data.frame(id,var1)
df
#   id var1
# 1  1    5
# 2  1    1
# 3  1    4
# 4  2    2
# 5  2   NA
# 6  3   NA
# 7  3   NA

Notice that id=1 contains only numeric for var1, id=2 contains one numeric and one NA, while id=3 contains only NAs and no numeric. I would like to obtain the following:

df
#   id var1
# 1  1    4
# 2  1    4
# 3  1    4
# 4  2    2
# 5  2    2
# 6  3   NA
# 7  3   NA

Here is what I did to achieve what I wanted, but I got an error.

mutate(var1=ifelse(length(na.omit(var1))==0,NA,last(na.omit(var1))))
# Error: Unsupported vector type language

EDIT1: Based on the comments, the above code works well for dplyr 0.4.3, and apparently not for dplyr 0.5.0 (in my case). Additionally, I want to impute using the last element not the element with the maximum value. Thus, I have changed my data frame to make it more general.

EDIT2:I have considered a data frame that list all possible cases. Three cases, (1) all numeric, (2) numeric + NAs and (3) all NAs.

Upvotes: 2

Views: 1486

Answers (3)

Nat
Nat

Reputation: 635

I had a similar issue. This worked for me:

df %>%
  group_by(id) %>%
  mutate(missing = is.na(var1)) %>%
  mutate(var1 = ifelse(any(!missing), var1[!missing][length(var1[!missing])], NA))

Upvotes: 0

Dambo
Dambo

Reputation: 3496

I was asked to explain my solution, but I actually don't fully understand why OP's solution doesn't work. Initially I thought it was something due to the class of object returned by na.omit

> na.omit(var1)
[1] 1 2 3 4
attr(,"na.action")
[1] 5
attr(,"class")
[1] "omit"

But then I noticed that nth (and I think last is just a wrapper for it) works fine:

df %>% 
group_by(id) %>% 
mutate(var1=nth(na.omit(var1),-1L))

An alternative, is to use tail rather then last

df %>% 
group_by(id) %>% 
mutate(var1=tail(na.omit(var1),1))

Or to create a new function, as I initially did:

aa <- function(x) last(na.omit(x))
df %>% group_by(id) %>% mutate(var1=aa(var1))

I was just curious about any differences in performance, so I checked them out but I would say they are equivalent

Unit: microseconds
                                          expr     min       lq     mean   median       uq        max neval
mutate(var1 = nth(na.omit(var1), -1L)) 795.270 830.4880 1022.196 897.6375 1026.795   4437.483  1000
mutate(var1 = tail(na.omit(var1)))     791.035 825.6165 1011.288 892.6270 1037.463   3406.842  1000
mutate(var1 = aa(var1))                788.085 825.5180 1108.872 888.9945 1036.664 102915.926  1000

Upvotes: 1

Arun kumar mahesh
Arun kumar mahesh

Reputation: 2359

Using dplyr package, we can group by each id and take max values of each id and replace in var1

library(dplyr)

    df <- df %>%
      group_by(id) %>%
      mutate(var1 = max(var1,na.rm=T))

    df
         id  var1
      <dbl> <int>
    1     1     3
    2     1     3
    3     1     3
    4     2     4
    5     2     4

Upvotes: 0

Related Questions