panpsych77
panpsych77

Reputation: 33

Stringr function or or gsub() to find an x digit string and extract first x digits?

Regex and stringr newbie here. I have a data frame with a column from which I want to find 10-digit numbers and keep only the first three digits. Otherwise, I want to just keep whatever is there.

So to make it easy let's just pretend it's a simple vector like this:

new<-c("111", "1234567891", "12", "12345")

I want to write code that will return a vector with elements: 111, 123, 12, and 12345. I also need to write code (I'm assuming I'll do this iteratively) where I extract the first two digits of a 5-digit string, like the last element above.

I've tried:

gsub("\\d{10}", "", new)

but I don't know what I could put for the replacement argument to get what I'm looking for. Also tried:

str_replace(new, "\\d{10}", "")

But again I don't know what to put in for the replacement argument to get just the first x digits.

Edit: I disagree that this is a duplicate question because it's not just that I want to extract the first X digits from a string but that I need to do that with specific strings that match a pattern (e.g., 10 digit strings.)

Upvotes: 3

Views: 1583

Answers (3)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 627546

You may use

new<-c("111", "1234567891", "12")
sub("^(\\d{3})\\d{7}$", "\\1", new)
## => [1] "111" "123" "12" 

See the R online demo and the regex demo.

Regex graph:

enter image description here

Details

  • ^ - start of string anchor
  • (\d{3}) - Capturing group 1 (this value is accessed using \1 in the replacement pattern): three digit chars
  • \d{7} - seven digit chars
  • $ - end of string anchor.

So, the sub command only matches strings that are composed only of 10 digits, captures the first three into a separate group, and then replaces the whole string (as it is the whole match) with the three digits captured in Group 1.

Upvotes: 1

bpbutti
bpbutti

Reputation: 393

If you are willing to use the library stringr from which comes the str_replace you are using. Just use str_extract

vec <- c(111, 1234567891, 12)
str_extract(vec, "^\\d{1,3}")

The regex ^\\d{1,3} matches at least 1 to a maximum of 3 digits occurring right in the beginning of the phrase. str_extract, as the name implies, extracts and returns these matches.

Upvotes: 2

NelsonGon
NelsonGon

Reputation: 13319

You can use:

 as.numeric(substring(my_vec,1,3)) 
#[1] 111 123  12

Upvotes: 1

Related Questions