str_extract all syntax

Question

I need some help with stringr::str_extract_all

x is the name of my data frame.

V1
(A_K9B,A_K9one,A_K9two,B_U10J)

x = x %>% 
  mutate(N_alph = map_chr(str_extract_all(x$V1, 'A_([A-Z][0-10])[A-Z]'), toString))
x = x %>% 
  mutate(N_.1 = map_chr(str_extract_all(x$V1, 'A_([A-Z][0-10])[o][n][e]'), toString))
x = x %>% 
  mutate(N_.2 = map_chr(str_extract_all(x$V1, 'A_([A-Z][0-10])[t][w][o]'), toString))

This is my current output:

V1                                N_alph  N_.1     N_.2
(A_K9B,A_K9one,A_K9two,B_U10J)   A_K9B   A_K9one  A_K9two

I am fine with my column N_alph as is I want it separate from the other two. But Ideally I would like to avoid typing [o][n][e] and [t][w][o] for those variables that are followed by words rather than one alphabetical letter, if I use:

x = x %>% 
  mutate(N_alph = map_chr(str_extract_all(x$V1, 'A_([A-Z][0-10])[A-Z]'), toString))
x = x %>% 
  mutate(N_all.words = map_chr(str_extract_all(x$V1, 'A_([A-Z][0-10])[\w+]'), toString))

Output is:

V1                                N_alph  N_all.words    
(A_K9B,A_K9one,A_K9two,B_U10J)   A_K9B   A_K9B,A_K9o,A_K9t

Desired output would be

V1                                N_alph  N_all.words    
(A_K9B,A_K9one,A_K9two,B_U10J)   A_K9B   A_K9one,A_K9two

koolmees · Accepted Answer

When you use metacharacters like \w, \b, \s, etc., you don't need the square brackets. But if you do use the square brackets than the + would need to be outside. Also, the number group should be [0-9] as we are talking about individual characters, not combinations of characters. To take into account numbers higher than 9 we just expand the amount of times we check for the group with {} brackets, or simply the + operator. The final result looks like so:

x %>% 
  mutate(N_all.words = str_extract_all(V1, 'A_([A-Z][0-9]{1,2})\w+'))

Resulting to:

                              V1             N_all.words
1 (A_K9B,A_K9one,A_K9two,B_U10J) A_K9B, A_K9one, A_K9two

I also created a version that I found a little tidier:

x %>% 
  mutate(N_all.words = str_extract_all(V1, 'A_\w\d{1,2}\w+'))

str_extract all syntax

Answers (1)

Related Questions