Susie Derkins
Susie Derkins

Reputation: 2638

How to use stringr::str_match_all inside dplyr::mutate in the tidyverse pipe

Using stringr::str_match, I can create a column that contains the characters after "H45" for the first instance of "H45" in each row.

library(dplyr)
library(stringr)

df <- tibble::tibble(A = c("H459 A452 H4544", "A452", "H4535"))

df <- df %>% mutate(H45_value = 
           str_match(A, 'H45([[0-9]]{1,2})') %>% 
           .[,2])

I would like to create a column using stringr::str_match_all that contains the characters after every appearance of "H45" in each row. However, I can't get str_match_all to run in the tidyverse pipe. I think it's because I don't know the correct syntax for calling [[1]][,2] within the pipe.

It works as an independent line of code:

str_match_all("H459 A452 H4544", 'H45([[0-9]]{1,2})')[[1]][,2]

I am hoping for output like this, where the first value of "H45_value" is a list or similar:

A H45_value
H459 A452 H4544 9, 44
A452 NA
H4535 35

Upvotes: 3

Views: 313

Answers (1)

lroha
lroha

Reputation: 34586

str_extract_all() is a better function choice as it returns a list of extracted values by default rather than the matrix returned by str_match_all(). So you could do:

library(dplyr)
library(stringr)

df %>%
  mutate(H45_value = str_extract_all(A, "(?<=H45)\\d+"))

# A tibble: 3 × 2
  A               H45_value
  <chr>           <list>   
1 H459 A452 H4544 <chr [2]>
2 A452            <chr [0]>
3 H4535           <chr [1]>

Where H45_value contains:

[[1]]
[1] "9"  "44"

[[2]]
character(0)

[[3]]
[1] "35"

If you wanted to use str_match_all() you would need to iterate over the result and extract the second column:

df %>%
  mutate(H45_value = lapply(str_match_all(A, 'H45([[0-9]]{1,2})'), `[`, , 2))

Upvotes: 1

Related Questions