Mefhisto1
Mefhisto1

Reputation: 2228

Finding index of a first NON-NA for a specific column in data frame

I have a data frame with multiple columns. Some of the data is missing (NA). I sorted the data frame by one column, and now the data is sorted properly but NA's are sorted as last values. I want to get the index of the last non-na value.

column1 column2
1       2
2       na
3       some data
4       some data
na      some data
na      some data
na      some data

So I want to get the index of 4. I tried

which(is.na(DF))

but it doesn't seem to return na values.

Upvotes: 2

Views: 3900

Answers (2)

mz3
mz3

Reputation: 93

I was attracted to this thread because I needed to find the first non-NA in each column of a data frame. Even though the original question is actually about finding the last non-NA in a column, I was able to figure out how to find the first non-NA from others' answers. I listed both below in case someone is wondering about the same thing.

Here is sample data. Notice that the columns should have been sorted with NAs at the beginning or end of each column.

(df <- data.frame(c=c(NA,NA,13,14,15), 
             d=c(16,17,NA,NA,NA), 
             e=c(NA,NA,NA,NA,NA), 
             f=c(18,19,20,21,22)))
   c  d  e  f
1 NA 16 NA 18
2 NA 17 NA 19
3 13 NA NA 20
4 14 NA NA 21
5 15 NA NA 22

Two ways to find the first non-NA in each column. First is to use a for loop

x1 <- vector("numeric")
for (j in 1:ncol(df)) {
  x1[j]<-df[,j] [min(which(!is.na(df[,j])))]
}

> x1
[1] 13 16 NA 18

Or use sapply. complete.cases does the same thing as !is.na on vectors.

(x2 <- sapply(seq_len(ncol(df)), function(x) df[,x] [min(which(!is.na(df[,x])))]))
[1] 13 16 NA 18
(x3 <- sapply(seq_len(ncol(df)), function(x) df[,x] [min(which(complete.cases(df[,x])))]))
[1] 13 16 NA 18

Similarly, there are two ways to find the last non-NA.

y1 <- vector("numeric")
for (j in 1:ncol(df)) {
  y1[j] <- df[,j][max(which(!is.na(df[,j])))]
}
> y1
[1] 15 17 NA 22

(y2 <- sapply(seq_len(ncol(df)), function(x) df[,x] [max(which(!is.na(df[,x])))]))
[1] 15 17 NA 22
(y3 <- sapply(seq_len(ncol(df)), function(x) df[,x] [max(which(complete.cases(df[,x])))]))
[1] 15 17 NA 22

Based on my testing, the two methods have similar speed.

Upvotes: 3

Matthew Lundberg
Matthew Lundberg

Reputation: 42629

It appears that you want this expression:

max(which(complete.cases(DF$column1)))

Upvotes: 3

Related Questions