Reputation: 2077
I would like to filter all rows between 2 patterns which follow a numerical order. For e.g. how could I filter all rows > 1st.7.1.* & < 1st.13.1.*
Here is how the dataframe looks like
Upvotes: 0
Views: 116
Reputation: 79271
We could remove the constant 1st.
and use the numbers. Here I changed the range to show the effect on the the provided data.
library(dplyr)
library(stringr)
df %>%
filter(between(as.numeric(stringr::str_remove(ball, "1st.")), 0.1, 1.1))
ball team batsman bowler nonStriker byes legbyes noballs
1 1st.0.1 New Zealand MJ Guptill Shaheen Shah Afridi DJ Mitchell 0 0 0
2 1st.0.2 New Zealand MJ Guptill Shaheen Shah Afridi DJ Mitchell 0 0 0
3 1st.0.3 New Zealand MJ Guptill Shaheen Shah Afridi DJ Mitchell 0 0 0
4 1st.0.4 New Zealand MJ Guptill Shaheen Shah Afridi DJ Mitchell 0 0 0
5 1st.0.5 New Zealand MJ Guptill Shaheen Shah Afridi DJ Mitchell 0 0 0
6 1st.0.6 New Zealand MJ Guptill Shaheen Shah Afridi DJ Mitchell 0 0 0
7 1st.1.1 New Zealand DJ Mitchell Imad Wasim MJ Guptill 0 0 0
structure(list(ball = c("1st.0.1", "1st.0.2", "1st.0.3", "1st.0.4",
"1st.0.5", "1st.0.6", "1st.1.1", "1st.1.2", "1st.1.3", "1st.1.4",
"1st.1.5", "1st.1.6", "1st.2.1", "1st.2.2"), team = c("New Zealand",
"New Zealand", "New Zealand", "New Zealand", "New Zealand", "New Zealand",
"New Zealand", "New Zealand", "New Zealand", "New Zealand", "New Zealand",
"New Zealand", "New Zealand", "New Zealand"), batsman = c("MJ Guptill",
"MJ Guptill", "MJ Guptill", "MJ Guptill", "MJ Guptill", "MJ Guptill",
"DJ Mitchell", "DJ Mitchell", "MJ Guptill", "MJ Guptill", "DJ Mitchell",
"MJ Guptill", "DJ Mitchell", "DJ Mitchell"), bowler = c("Shaheen Shah Afridi",
"Shaheen Shah Afridi", "Shaheen Shah Afridi", "Shaheen Shah Afridi",
"Shaheen Shah Afridi", "Shaheen Shah Afridi", "Imad Wasim", "Imad Wasim",
"Imad Wasim", "Imad Wasim", "Imad Wasim", "Imad Wasim", "Shaheen Shah Afridi",
"Shaheen Shah Afrid"), nonStriker = c("DJ Mitchell", "DJ Mitchell",
"DJ Mitchell", "DJ Mitchell", "DJ Mitchell", "DJ Mitchell", "MJ Guptill",
"MJ Guptill", "DJ Mitchell", "DJ Mitchell", "MJ Guptill", "DJ Mitchell",
"MJ Guptill", "MJ Guptill"), byes = c(0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), legbyes = c(0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), noballs = c(0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L)), class = "data.frame", row.names = c(NA,
-14L))
Upvotes: 2
Reputation: 21440
You can extract the numerical part and subset on this:
library(stringr)
df %>%
mutate(num = as.numeric(str_extract(ball, "(?<=st\\.).*"))) %>%
filter(num > 7.1 & num < 13.1) %>%
select(-num)
ball
1 1st.10.9
2 1st.12.7
Data:
df <- data.frame(
ball = c("1st.7.1","1st.7.9", "1st.12.7", "1st.13.1")
)
Upvotes: 1
Reputation: 887991
We may use parse_number
to get the numeric part and then do the filter
library(dplyr)
df1 %>%
filter(between(readr::parse_number(ball), 7.1, 13.1))
Or another option is to extract the substring and filter
library(stringr)
df1 %>%
filter(between(as.numeric(str_extract(ball, "\\d+(\\.\\d+)?$")), 7.1, 13.1))
-output
# A tibble: 61 × 2
ball team
<chr> <chr>
1 1st.7.1 New Zealand
2 1st.7.2 New Zealand
3 1st.7.3 New Zealand
4 1st.7.4 New Zealand
5 1st.7.5 New Zealand
6 1st.7.6 New Zealand
7 1st.7.7 New Zealand
8 1st.7.8 New Zealand
9 1st.7.9 New Zealand
10 1st.8 New Zealand
# … with 51 more rows
df1 <- tibble(ball = str_c('1st.', seq(0.1, 13.5, by = 0.1)), team = 'New Zealand')
Upvotes: 3