Reputation: 813
In my data, I have a column of open text field data that resembles the following sample:
d <- tribble(
~x,
"i am 10 and she is 50",
"he is 32 and i am 22",
"he may be 70 and she may be 99",
)
I would like to use regex
to extract all two digit numbers to a new column called y
. I have the following code and it works well extracting the first match:
d %>%
mutate(y = str_extract(x, "([0-9]{2})"))
# A tibble: 3 x 2
x y
<chr> <chr>
1 i am 10 and she is 50 10
2 he is 32 and i am 22 32
3 he may be 70 and she may be 99 70
But, is there a way to extract both two-digit numbers to the same column with some standard separator (e.g comma)?
Upvotes: 3
Views: 2286
Reputation: 18701
We can also use extract
and unite
from tidyr
:
library(dplyr)
library(tidyr)
d %>%
extract(x, c('y', 'z'), regex = "(\\d+)[^\\d]+(\\d+)", remove = FALSE)
Output:
# A tibble: 3 x 3
x y z
<chr> <chr> <chr>
1 i am 10 and she is 50 10 50
2 he is 32 and i am 22 32 22
3 he may be 70 and she may be 99 70 99
Return single column:
d %>%
extract(x, c('y', 'z'), regex = "(\\d+)[^\\d]+(\\d+)", remove = FALSE) %>%
unite('y', y, z, sep = ', ')
Output:
# A tibble: 3 x 3
x y
<chr> <chr>
1 i am 10 and she is 50 10, 50
2 he is 32 and i am 22 32, 22
3 he may be 70 and she may be 99 70, 99
Upvotes: 4
Reputation: 887971
We can use str_extract_all
instead of str_extract
because str_extract
matches only the first instance where as the _all
suffix is global and would extract all the instances in a list
, which can be convert back to two columns with unnest_wider
library(dplyr)
library(tidyr)
library(stringr)
d %>%
mutate(out = str_extract_all(x, "\\d{2}")) %>%
unnest_wider(c(out)) %>%
rename_at(-1, ~ c('y', 'z')) %>%
type.convert(as.is = TRUE)
# A tibble: 3 x 3
# x y z
# <chr> <int> <int>
#1 i am 10 and she is 50 10 50
#2 he is 32 and i am 22 32 22
#3 he may be 70 and she may be 99 70 99
If we need as a string column with ,
as separator, after extraction into a list
, loop over the list
with map
and concatenate all elements to a single string with toString
(wrapper for paste(., collapse=", ")
)
library(purrr)
d %>%
mutate(y = str_extract_all(x, "\\b\\d{2}\\b") %>%
map_chr(toString))
# A tibble: 3 x 2
# x y
# <chr> <chr>
#1 i am 10 and she is 50 10, 50
#2 he is 32 and i am 22 32, 22
#3 he may be 70 and she may be 99 70, 99
Upvotes: 3