Reputation: 1285
Let there be a dataframe with uneven rows length, of unknown columns -i.e each row may be of a different length, but all NA
values are always at the end. There are also three values: start
, penultimate
and last
.
Problem: how to (elegantly, without nested loops) find all rows on the data frame that match that condition?
Example: For the following dataframe and values:
df <- structure(list(V1 = c("a", "a", "a", "a", "b"), V2 = c("b", "n", "t", "o", "l"), V3 = c("c", "m", "h", "j", "p"), V4 = c("d", "c", "j", "", "e"), V5 = c("", "d", "", "", "")),
.Names = c("V1", "V2", "V3", "V4", "V5"),
row.names = c(NA, 5L), class = "data.frame")
df[df == ""] <- NA
start <- "a"
penultimate <- "c"
last <- "d"
The desired output would be the following subset:
V1 V2 V3 V4 V5
1 a b c d [NA]
2 a n m c d
Upvotes: 2
Views: 206
Reputation: 13581
You can use regex expressions to your advantage here
pattern <- paste0("^", start, ".*", penultimate, last, "$")
# "^a.*cd$"
index <- grepl(pattern, apply(df, 1, function(i) paste(i[!is.na(i)], collapse="")))
# [1] TRUE TRUE FALSE FALSE FALSE
df[index,]
# V1 V2 V3 V4 V5
# 1 a b c d <NA>
# 2 a n m c d
Upvotes: 1
Reputation: 5201
Here's one way using base R:
output <- apply(df, 1, function(row) {
index_last <- max(which(!is.na(row)))
if (row[1] == start & row[index_last - 1] == penultimate & row[index_last] == last) {
return(row)
}
return(NULL)
})
This gives a list of the filtered rows which we can rbind
back into a data.frame
:
> do.call(rbind, output)
V1 V2 V3 V4 V5
1 "a" "b" "c" "d" NA
2 "a" "n" "m" "c" "d"
Upvotes: 1
Reputation: 389125
I managed to solve it with apply
with MARGIN=1
however, I doubt about it's efficiency.
df[apply(df, 1, function(x) {
temp = x[!is.na(x)]
temp[1] == start & tail(temp, 1) == last & tail(temp, 2)[1] == penultimate
}), ]
# V1 V2 V3 V4 V5
#1 a b c d <NA>
#2 a n m c d
For each row, we first remove all the NA
elements and then check the conditions (start
, last
and penultimate
) and subset the rows using the boolean indices.
Upvotes: 2