Reputation: 13
I have a business.id
column in a data frame called total_pop
that contains only number that contain anywhere between 1 and 4 digits. I'm trying to extract the numbers that only contain 4 digits AND ALSO begin with "13".
Sample Data:
sex age business.id
-------------------------
1 23 13
1 36 465
2 42 1309
1 19 1375
2 38 137
Desired Result:
sex age business.id
-------------------------
2 42 1309
1 19 1375
I've tried: grep("{4}^[1][3]",total_pop$business.id,value=T)
but it returns numbers with any amount of digits starting with 13. So it returns 136 and 13.
Upvotes: 1
Views: 508
Reputation: 1486
library(tidyverse)
df <- tibble::tribble(
~sex, ~age, ~business.id,
1L, 23L, 13L,
1L, 36L, 465L,
2L, 42L, 1309L,
1L, 19L, 1375L,
2L, 38L, 137L
)
df %>%
filter(str_detect(business.id, "13\\d{2}"))
#> # A tibble: 2 x 3
#> sex age business.id
#> <int> <int> <int>
#> 1 2 42 1309
#> 2 1 19 1375
Upvotes: 0
Reputation: 270378
1) nchar counts the number of characters and substr extracts the first two characters.
subset(total_pop, nchar(business.id) == 4 & substr(business.id, 1, 2) == 13)
## sex age business.id
## 3 2 42 1309
## 4 1 19 1375
2) We can use a regular expression to grep out the values of interest. ^ matches the start of the business.id, .. match any two characters and $ matches the end.
subset(total_pop, grepl("^13..$", business.id))
## sex age business.id
## 3 2 42 1309
## 4 1 19 1375
The input in reproducible form:
total_pop <- structure(list(sex = c(1L, 1L, 2L, 1L, 2L), age = c(23L, 36L,
42L, 19L, 38L), business.id = c(13L, 465L, 1309L, 1375L, 137L
)), class = "data.frame", row.names = c(NA, -5L))
Upvotes: 1
Reputation: 522762
I would handle this numerically:
df[df$business.id >= 1000 & floor(df$business.id / 100) == 13, ]
sex age business.id
3 2 42 1309
4 1 19 1375
If you wanted to handle this using business.id
as a string, then we could use grepl
:
df[grepl("^13\\d{2}$", df$business.id), ]
Upvotes: 1