lgxqzz
lgxqzz

Reputation: 123

Extract rows have sequences information in R

I'm having a trouble with a date set, it contains multiple sequences informations, while other rows are just "NA"s. The data looks like:

> dat[90:100,]
           V1 V2 V3   V4  V5  V6                        V7 V8
90  Sequence: 90 NA   NA  NA                               NA
91  Sequence: 91 NA   NA  NA                               NA
92  Sequence: 92 NA   NA  NA                               NA
93  Sequence: 93 NA   NA  NA                               NA
94          1 25  3  8.3 3.0 100                         0 50
95          0  0 68 32.0 0.9 GGT GGTGGTGGTGGTGGTGGTGGTGGTG NA
96  Sequence: 94 NA   NA  NA                               NA
97  Sequence: 95 NA   NA  NA                               NA
98  Sequence: 96 NA   NA  NA                               NA
99  Sequence: 97 NA   NA  NA                               NA
100 Sequence: 98 NA   NA  NA                               NA

And I would like to keep the row of 93 to 95, which contain the sequences information, and remove others:

93  Sequence: 93 NA   NA  NA                               NA
94          1 25  3  8.3 3.0 100                         0 50
95          0  0 68 32.0 0.9 GGT GGTGGTGGTGGTGGTGGTGGTGGTG NA

Is there any way I can do it in R? for example for loops?

Upvotes: 0

Views: 85

Answers (2)

maloneypatr
maloneypatr

Reputation: 3622

Wouldn't it just be:

dat[!grepl('Sequence', dat$V1), ]

-- UPDATE --

Sorry about that, I didn't see that you wanted the row above as well. This should work.

rows <- dat[!grepl('Sequence', dat$V1), ] # rows that don't contain 'Sequence'
rows <- as.numeric(row.names(rows))       # convert row.names to numeric
rows2 <- rows - 1                         # take previous rows
rows2 <- unique(c(rows2, rows))           # de-dupe
dat[rows2, ]                              # all the rows you want

#          V1 V2 V3   V4  V5  V6                        V7 V8
# 4 Sequence: 93 NA   NA  NA                               NA
# 5         1 25  3  8.3 3.0 100                         0 50
# 6         0  0 68 32.0 0.9 GGT GGTGGTGGTGGTGGTGGTGGTGGTG NA

Upvotes: 0

SlowLearner
SlowLearner

Reputation: 7997

If you want to remove the NA rows, look at the is.na function and invert it:

dat2 <- dat[!is.na(dat$V3), ]

If you just want a slice of the data frame, specify it like this:

dat2 <- dat[93:95, ]

But I think you already know how to do this, so it's not entirely clear to me what you're asking. I suspect you want to remove NA rows.

Upvotes: 1

Related Questions