Reputation: 123
I'm having a trouble with a date set, it contains multiple sequences informations, while other rows are just "NA"s. The data looks like:
> dat[90:100,]
V1 V2 V3 V4 V5 V6 V7 V8
90 Sequence: 90 NA NA NA NA
91 Sequence: 91 NA NA NA NA
92 Sequence: 92 NA NA NA NA
93 Sequence: 93 NA NA NA NA
94 1 25 3 8.3 3.0 100 0 50
95 0 0 68 32.0 0.9 GGT GGTGGTGGTGGTGGTGGTGGTGGTG NA
96 Sequence: 94 NA NA NA NA
97 Sequence: 95 NA NA NA NA
98 Sequence: 96 NA NA NA NA
99 Sequence: 97 NA NA NA NA
100 Sequence: 98 NA NA NA NA
And I would like to keep the row of 93 to 95, which contain the sequences information, and remove others:
93 Sequence: 93 NA NA NA NA
94 1 25 3 8.3 3.0 100 0 50
95 0 0 68 32.0 0.9 GGT GGTGGTGGTGGTGGTGGTGGTGGTG NA
Is there any way I can do it in R? for example for loops?
Upvotes: 0
Views: 85
Reputation: 3622
Wouldn't it just be:
dat[!grepl('Sequence', dat$V1), ]
-- UPDATE --
Sorry about that, I didn't see that you wanted the row above as well. This should work.
rows <- dat[!grepl('Sequence', dat$V1), ] # rows that don't contain 'Sequence'
rows <- as.numeric(row.names(rows)) # convert row.names to numeric
rows2 <- rows - 1 # take previous rows
rows2 <- unique(c(rows2, rows)) # de-dupe
dat[rows2, ] # all the rows you want
# V1 V2 V3 V4 V5 V6 V7 V8
# 4 Sequence: 93 NA NA NA NA
# 5 1 25 3 8.3 3.0 100 0 50
# 6 0 0 68 32.0 0.9 GGT GGTGGTGGTGGTGGTGGTGGTGGTG NA
Upvotes: 0
Reputation: 7997
If you want to remove the NA rows, look at the is.na
function and invert it:
dat2 <- dat[!is.na(dat$V3), ]
If you just want a slice of the data frame, specify it like this:
dat2 <- dat[93:95, ]
But I think you already know how to do this, so it's not entirely clear to me what you're asking. I suspect you want to remove NA rows.
Upvotes: 1