Neal Barsch
Neal Barsch

Reputation: 2940

Locate position of first number in string [R]

How can I create a function in R that locates the word position of the first number in a string?

For example:

string1 <- "Hello I'd like to extract where the first 1010 is in this string"
#desired_output for string1
9

string2 <- "80111 is in this string"
#desired_output for string2
1

string3 <- "extract where the first 97865 is in this string"
#desired_output for string3
5

Upvotes: 1

Views: 1373

Answers (6)

ekoam
ekoam

Reputation: 8844

Here is another approach. We can trim off the remaining characters after the first digit of the first number. Then, just find the position of the last word. \\b matches word boundaries while \\S+ matches one or more non-whitespace characters.

first_numeric_word <- function(x) {
  x <- substr(x, 1L, regexpr("\\b\\d+\\b", x))
  lengths(gregexpr("\\b\\S+\\b", x))
}

Output

> first_numeric_word(x)
[1] 9 1 5

Data

x <- c(
  "Hello I'd like to extract where  the first 1010 is in this string", 
  "80111 is in this string", 
  "extract where the   first  97865 is in this string"
)

Upvotes: 1

semaphorism
semaphorism

Reputation: 866

Try the following:

library(stringr)

position_first_number <- function(string) {
  min(which(str_detect(str_split(string, "\\s+", simplify = TRUE), "[0-9]+")))
}

With your example strings:

> string1 <- "Hello I'd like to extract where the first 1010 is in this string"
> position_first_number(string1)
[1] 9
 
> string2 <- "80111 is in this string"
> position_first_number(string2)
[1] 1
 
> string3 <- "extract where the first 97865 is in this string"
> position_first_number(string3)
[1] 5

Upvotes: 0

Edo
Edo

Reputation: 7858

Here I'll leave a fully tidyverse approach:

library(purrr)
library(stringr)

map_dbl(str_split(strings, " "), str_which, "\\d+")
#> [1] 9 1 5

map_dbl(str_split(strings[1], " "), str_which, "\\d+")
#> [1] 9

Note that it works both with one and multiple strings.


Where strings is:

strings <- c("Hello I'd like to extract where the first 1010 is in this string",
             "80111 is in this string",
             "extract where the first 97865 is in this string")

Upvotes: 1

Andrew
Andrew

Reputation: 5138

Here is a base solution using rapply() w/ grep() to recurse through the results of strsplit() and works with a vector of strings.

Note: swap " " and fixed = TRUE with "\\s+" and fixed = FALSE (the default) if you want to split the strings on any whitespace instead of a literal space.

rapply(strsplit(strings, " ", fixed = TRUE), function(x) grep("[0-9]+", x))
[1] 9 1 5

Data:

strings = c("Hello I'd like to extract where the first 1010 is in this string", 
            "80111 is in this string", "extract where the first 97865 is in this string")

Upvotes: 0

Tim Biegeleisen
Tim Biegeleisen

Reputation: 522244

I would just use grep and strsplit here for a base R option:

sapply(input, function(x) grep("\\d+", strsplit(x, " ")[[1]]))

Hello I'd like to extract where the first 1010 is in this string
                                                               9
                                         80111 is in this string
                                                               1
                 extract where the first 97865 is in this string
                                                               5

Data:

input <- c("Hello I'd like to extract where the first 1010 is in this string",
           "80111 is in this string",
           "extract where the first 97865 is in this string")

Upvotes: 6

Ben Norris
Ben Norris

Reputation: 5747

Here is a way to return your desired output:

library(stringr)
min(which(!is.na(suppressWarnings(as.numeric(str_split(string, " ", simplify = TRUE))))))

This is how it works:

str_split(string, " ", simplify = TRUE) # converts your string to a vector/matrix, splitting at space

as.numeric(...) # tries to convert each element to a number, returning NA when it fails

suppressWarnings(...) # suppresses the warnings generated by as.numeric

!is.na(...) # returns true for the values that are not NA (i.e. the numbers)

which(...) # returns the position for each TRUE values

min(...) # returns the first position

The output:

min(which(!is.na(suppressWarnings(as.numeric(str_split(string1, " ", simplify = TRUE))))))
[1] 9
min(which(!is.na(suppressWarnings(as.numeric(str_split(string2, " ", simplify = TRUE))))))
[1] 1
min(which(!is.na(suppressWarnings(as.numeric(str_split(string3, " ", simplify = TRUE))))))
[1] 5

Upvotes: 4

Related Questions