Reputation: 123
I recently received a response to sub setting a range of rows based on start and stop values/identifiers in a specific column - the response can be read here.
What I'm hoping to receive some help on this time around is doing the same thing (i.e. subset all rows between each instance of the identifier), except the identifier in question is embedded within a sentence. So the identifier itself is contained within a cell with other text.
Example:
X1 X2
'hello this is a test' 1
'identifier 1234' 2
'hello' 3
'hello' 4
'hello 1234' 5
'hello again' 6
Assuming the identifier for the rows I'm looking to subset is '1234', the output I'd be hoping for would be 2,3,4,5. The identifier will never show up more than twice so there are clear start and stop points.
I have tried combining filter, grepl and between but have only managed to filter the rows with the identifier, and not the rows in between the identifiers.
I hope this makes sense!
Upvotes: 2
Views: 229
Reputation: 886968
As there is only a single instance of 'identifier' that specifies the 'start/stop', use grep
to get the row index that matches the pattern, get a sequence between the start and end (:
) and subset the 'X2' values
i1 <- grep('1234', df1$X1)
df1$X2[i1[1]:i1[2]]
#[1] 2 3 4 5
df1 <- structure(list(X1 = c("hello this is a test", "identifier 1234",
"hello", "hello", "hello 1234", "hello again"), X2 = 1:6),
class = "data.frame", row.names = c(NA, -6L))
Upvotes: 1