Reputation: 33
Regex and stringr newbie here. I have a data frame with a column from which I want to find 10-digit numbers and keep only the first three digits. Otherwise, I want to just keep whatever is there.
So to make it easy let's just pretend it's a simple vector like this:
new<-c("111", "1234567891", "12", "12345")
I want to write code that will return a vector with elements: 111, 123, 12, and 12345. I also need to write code (I'm assuming I'll do this iteratively) where I extract the first two digits of a 5-digit string, like the last element above.
I've tried:
gsub("\\d{10}", "", new)
but I don't know what I could put for the replacement argument to get what I'm looking for. Also tried:
str_replace(new, "\\d{10}", "")
But again I don't know what to put in for the replacement argument to get just the first x digits.
Edit: I disagree that this is a duplicate question because it's not just that I want to extract the first X digits from a string but that I need to do that with specific strings that match a pattern (e.g., 10 digit strings.)
Upvotes: 3
Views: 1583
Reputation: 627546
You may use
new<-c("111", "1234567891", "12")
sub("^(\\d{3})\\d{7}$", "\\1", new)
## => [1] "111" "123" "12"
See the R online demo and the regex demo.
Details
^
- start of string anchor(\d{3})
- Capturing group 1 (this value is accessed using \1
in the replacement pattern): three digit chars\d{7}
- seven digit chars$
- end of string anchor.So, the sub
command only matches strings that are composed only of 10 digits, captures the first three into a separate group, and then replaces the whole string (as it is the whole match) with the three digits captured in Group 1.
Upvotes: 1
Reputation: 393
If you are willing to use the library stringr
from which comes the str_replace
you are using. Just use str_extract
vec <- c(111, 1234567891, 12)
str_extract(vec, "^\\d{1,3}")
The regex ^\\d{1,3}
matches at least 1 to a maximum of 3 digits occurring right in the beginning of the phrase. str_extract
, as the name implies, extracts and returns these matches.
Upvotes: 2