Reputation: 2940
How can I create a function in R that locates the word position of the first number in a string?
For example:
string1 <- "Hello I'd like to extract where the first 1010 is in this string"
#desired_output for string1
9
string2 <- "80111 is in this string"
#desired_output for string2
1
string3 <- "extract where the first 97865 is in this string"
#desired_output for string3
5
Upvotes: 1
Views: 1373
Reputation: 8844
Here is another approach. We can trim off the remaining characters after the first digit of the first number. Then, just find the position of the last word. \\b
matches word boundaries while \\S+
matches one or more non-whitespace characters.
first_numeric_word <- function(x) {
x <- substr(x, 1L, regexpr("\\b\\d+\\b", x))
lengths(gregexpr("\\b\\S+\\b", x))
}
Output
> first_numeric_word(x)
[1] 9 1 5
Data
x <- c(
"Hello I'd like to extract where the first 1010 is in this string",
"80111 is in this string",
"extract where the first 97865 is in this string"
)
Upvotes: 1
Reputation: 866
Try the following:
library(stringr)
position_first_number <- function(string) {
min(which(str_detect(str_split(string, "\\s+", simplify = TRUE), "[0-9]+")))
}
With your example strings:
> string1 <- "Hello I'd like to extract where the first 1010 is in this string"
> position_first_number(string1)
[1] 9
> string2 <- "80111 is in this string"
> position_first_number(string2)
[1] 1
> string3 <- "extract where the first 97865 is in this string"
> position_first_number(string3)
[1] 5
Upvotes: 0
Reputation: 7858
Here I'll leave a fully tidyverse
approach:
library(purrr)
library(stringr)
map_dbl(str_split(strings, " "), str_which, "\\d+")
#> [1] 9 1 5
map_dbl(str_split(strings[1], " "), str_which, "\\d+")
#> [1] 9
Note that it works both with one and multiple strings.
Where strings
is:
strings <- c("Hello I'd like to extract where the first 1010 is in this string",
"80111 is in this string",
"extract where the first 97865 is in this string")
Upvotes: 1
Reputation: 5138
Here is a base solution using rapply()
w/ grep()
to recurse through the results of strsplit()
and works with a vector of strings.
Note: swap " "
and fixed = TRUE
with "\\s+"
and fixed = FALSE
(the default) if you want to split the strings on any whitespace instead of a literal space.
rapply(strsplit(strings, " ", fixed = TRUE), function(x) grep("[0-9]+", x))
[1] 9 1 5
Data:
strings = c("Hello I'd like to extract where the first 1010 is in this string",
"80111 is in this string", "extract where the first 97865 is in this string")
Upvotes: 0
Reputation: 522244
I would just use grep
and strsplit
here for a base R option:
sapply(input, function(x) grep("\\d+", strsplit(x, " ")[[1]]))
Hello I'd like to extract where the first 1010 is in this string
9
80111 is in this string
1
extract where the first 97865 is in this string
5
Data:
input <- c("Hello I'd like to extract where the first 1010 is in this string",
"80111 is in this string",
"extract where the first 97865 is in this string")
Upvotes: 6
Reputation: 5747
Here is a way to return your desired output:
library(stringr)
min(which(!is.na(suppressWarnings(as.numeric(str_split(string, " ", simplify = TRUE))))))
This is how it works:
str_split(string, " ", simplify = TRUE) # converts your string to a vector/matrix, splitting at space
as.numeric(...) # tries to convert each element to a number, returning NA when it fails
suppressWarnings(...) # suppresses the warnings generated by as.numeric
!is.na(...) # returns true for the values that are not NA (i.e. the numbers)
which(...) # returns the position for each TRUE values
min(...) # returns the first position
The output:
min(which(!is.na(suppressWarnings(as.numeric(str_split(string1, " ", simplify = TRUE))))))
[1] 9
min(which(!is.na(suppressWarnings(as.numeric(str_split(string2, " ", simplify = TRUE))))))
[1] 1
min(which(!is.na(suppressWarnings(as.numeric(str_split(string3, " ", simplify = TRUE))))))
[1] 5
Upvotes: 4