Simon1723
Simon1723

Reputation: 79

Subset dataset by selecting a span of rows beginning and ending with certain value or deleting rows before and after certain values

I haven't found any similar stuff to my question and now I'm in trouble with the following problem: I've a mass of data, so I created a more simple data, that you can use:

structure(list(id = 123:182, tag = c(1, 1, 1, 1, 1, 1, 1, 1, 
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 
3, 3, 3, 3, 3, 3, 3, 3, 3, 3), a = c(3, 3, 5, 9, 1, 9, 9, 5, 
5, 1, 1, 1, 5, 3, 9, 3, 5, 9, 3, 9, 9, 1, 5, 1, 3, 3, 1, 3, 9, 
3, 3, 5, 3, 1, 9, 5, 9, 1, 5, 3, 9, 5, 9, 5, 5, 9, 1, 3, 5, 5, 
3, 9, 3, 1, 1, 1, 3, 5, 5, 3), b = c(0, 0, 0, 0, 1, 1, 0, 0, 
1, 0, 1, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 1, 
0, 1, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 
1, 0, 1, 1, 1, 1, 0, 0, 0, 0)), .Names = c("id", "tag", "a", 
"b"), row.names = c(NA, -60L), class = "data.frame")

I want to subset the data by tag, but the rows with the first and the last values of 0 in column b should be deleted. I begin to try something with the ddplr - function, but it doesn't work and it's not worth to see...

The result shoud look like this:

     id tag a   b
5   127 1   1   1
6   128 1   9   1
7   129 1   9   0
8   130 1   5   0
9   131 1   5   1
10  132 1   1   0
11  133 1   1   1
12  134 1   1   0
13  135 1   5   1
14  136 1   3   0
15  137 1   9   0
16  138 1   3   1
24  146 2   1   1
25  147 2   3   0
26  148 2   3   1
27  149 2   1   0
28  150 2   3   1
29  151 2   9   1
30  152 2   3   0
31  153 2   3   1
32  154 2   5   0
33  155 2   3   1
34  156 2   1   1
35  157 2   9   1
36  158 2   5   1
45  167 3   5   1
46  168 3   9   1
47  169 3   1   0
48  170 3   3   0
49  171 3   5   1
50  172 3   5   0
51  173 3   3   1
52  174 3   9   0
53  175 3   3   1
54  176 3   1   1
55  177 3   1   1
56  178 3   1   1

What can I do?

Upvotes: 1

Views: 384

Answers (1)

G. Grothendieck
G. Grothendieck

Reputation: 269481

If dd is your data frame try this:

w <- which(dd$b == 1)
dd[min(w):max(w), ]

To do it by tag try this:

is.ok <- function(b.ok) {
   if (any(b.ok)) {
        w <- which(b.ok)
        seq_along(b.ok) %in% min(w):max(w) 
    } else FALSE
}
ok <- ave(dd$b == 1, dd$tag, FUN = is.ok)
dd[ok, ]

UPDATE: by tag

Upvotes: 1

Related Questions