New York Crosser
New York Crosser

Reputation: 65

Select first non-NA value using R

df<-data.frame(ID = c(1,1,1,2,3,3,3),
          test = c(NA, 5.5, 6.4, NA, 7.3, NA, 10.9))

I want to create a variable called "value", which is the first non-NA value for the test for each individual ID. For individual ID 2 who only has the NA, the value is NA.

The expected output is:

df<-data.frame(ID = c(1,1,1,2,3,3,3),
           test = c(NA, 5.5, 6.4, NA, 7.3, NA, 10.9),
           value = c(5.5, 5.5, 5.5, NA, 7.3, 7.3, 7.3))

Upvotes: 2

Views: 1160

Answers (3)

ThomasIsCoding
ThomasIsCoding

Reputation: 101044

Here is a data.table option using first + na.omit

> setDT(df)[, value := first(na.omit(test)), ID][]
   ID test value
1:  1   NA   5.5
2:  1  5.5   5.5
3:  1  6.4   5.5
4:  2   NA    NA
5:  3  7.3   7.3
6:  3   NA   7.3
7:  3 10.9   7.3

Upvotes: 1

GKi
GKi

Reputation: 39647

You can use ave to group by ID and which.max to select with [ the first non NA value.

df$value <- ave(df$test, df$ID, FUN=function(x) x[which.max(!is.na(x))])
df
#  ID test value
#1  1   NA   5.5
#2  1  5.5   5.5
#3  1  6.4   5.5
#4  2   NA    NA
#5  3  7.3   7.3
#6  3   NA   7.3
#7  3 10.9   7.3

Upvotes: 3

akrun
akrun

Reputation: 886938

We can use first on the non-NA elements after grouping

library(dplyr)
df <- df %>%
    group_by(ID) %>% 
    mutate(value = first(test[complete.cases(test)]))

Upvotes: 2

Related Questions