Reputation: 2228
I have a data frame with multiple columns. Some of the data is missing (NA). I sorted the data frame by one column, and now the data is sorted properly but NA's are sorted as last values. I want to get the index of the last non-na value.
column1 column2
1 2
2 na
3 some data
4 some data
na some data
na some data
na some data
So I want to get the index of 4. I tried
which(is.na(DF))
but it doesn't seem to return na values.
Upvotes: 2
Views: 3900
Reputation: 93
I was attracted to this thread because I needed to find the first non-NA in each column of a data frame. Even though the original question is actually about finding the last non-NA in a column, I was able to figure out how to find the first non-NA from others' answers. I listed both below in case someone is wondering about the same thing.
Here is sample data. Notice that the columns should have been sorted with NAs at the beginning or end of each column.
(df <- data.frame(c=c(NA,NA,13,14,15),
d=c(16,17,NA,NA,NA),
e=c(NA,NA,NA,NA,NA),
f=c(18,19,20,21,22)))
c d e f
1 NA 16 NA 18
2 NA 17 NA 19
3 13 NA NA 20
4 14 NA NA 21
5 15 NA NA 22
Two ways to find the first non-NA in each column. First is to use a for loop
x1 <- vector("numeric")
for (j in 1:ncol(df)) {
x1[j]<-df[,j] [min(which(!is.na(df[,j])))]
}
> x1
[1] 13 16 NA 18
Or use sapply. complete.cases does the same thing as !is.na on vectors.
(x2 <- sapply(seq_len(ncol(df)), function(x) df[,x] [min(which(!is.na(df[,x])))]))
[1] 13 16 NA 18
(x3 <- sapply(seq_len(ncol(df)), function(x) df[,x] [min(which(complete.cases(df[,x])))]))
[1] 13 16 NA 18
Similarly, there are two ways to find the last non-NA.
y1 <- vector("numeric")
for (j in 1:ncol(df)) {
y1[j] <- df[,j][max(which(!is.na(df[,j])))]
}
> y1
[1] 15 17 NA 22
(y2 <- sapply(seq_len(ncol(df)), function(x) df[,x] [max(which(!is.na(df[,x])))]))
[1] 15 17 NA 22
(y3 <- sapply(seq_len(ncol(df)), function(x) df[,x] [max(which(complete.cases(df[,x])))]))
[1] 15 17 NA 22
Based on my testing, the two methods have similar speed.
Upvotes: 3
Reputation: 42629
It appears that you want this expression:
max(which(complete.cases(DF$column1)))
Upvotes: 3