Reputation: 136
I'm wondering if there is a better way to do this or if I might be running into some unforeseen trouble. I need to subset from a data frame but I do not want to use the column names. I'd need do it by referencing the column number.
data <- data.frame(col1= c(50, 20, NA, 100, 50),
col2= c(NA, 25, 125, 50, NA),
col3= c(NA, 100, 15, 55, 25),
col4= c(NA, 30, 125, 100, NA),
col5= c(80, 25, 75, 40, NA))
Suppose I want to subset the data frame and keep only the row that contain 3 consecutive NAs before a valid number in column 5. Best I can come up with without using column names is this:
sub <- data[(which(is.na(data[2]) &
is.na(data[3]) &
is.na(data[4]) &
!is.na(data[5]))), ]
Anyone see any trouble with this or know of a better way? I'm worried about using subsets within subsets although every thing appears to be working as it should.
Upvotes: 4
Views: 814
Reputation: 193677
If you're looking to condense your code a little, you can do something like:
> data[rowSums(is.na(data[2:4])) == 3 & !is.na(data[5]), ]
col1 col2 col3 col4 col5
1 50 NA NA NA 80
Upvotes: 4