Reputation: 55
I need some help with stringr::str_extract_all
x
is the name of my data frame.
V1
(A_K9B,A_K9one,A_K9two,B_U10J)
x = x %>%
mutate(N_alph = map_chr(str_extract_all(x$V1, 'A_([A-Z][0-10])[A-Z]'), toString))
x = x %>%
mutate(N_.1 = map_chr(str_extract_all(x$V1, 'A_([A-Z][0-10])[o][n][e]'), toString))
x = x %>%
mutate(N_.2 = map_chr(str_extract_all(x$V1, 'A_([A-Z][0-10])[t][w][o]'), toString))
This is my current output:
V1 N_alph N_.1 N_.2
(A_K9B,A_K9one,A_K9two,B_U10J) A_K9B A_K9one A_K9two
I am fine with my column N_alph
as is I want it separate from the other two. But Ideally I would like to avoid typing [o][n][e]
and [t][w][o]
for those variables that are followed by words rather than one alphabetical letter, if I use:
x = x %>%
mutate(N_alph = map_chr(str_extract_all(x$V1, 'A_([A-Z][0-10])[A-Z]'), toString))
x = x %>%
mutate(N_all.words = map_chr(str_extract_all(x$V1, 'A_([A-Z][0-10])[\\w+]'), toString))
Output is:
V1 N_alph N_all.words
(A_K9B,A_K9one,A_K9two,B_U10J) A_K9B A_K9B,A_K9o,A_K9t
Desired output would be
V1 N_alph N_all.words
(A_K9B,A_K9one,A_K9two,B_U10J) A_K9B A_K9one,A_K9two
Upvotes: 0
Views: 101
Reputation: 2783
When you use metacharacters like \w, \b, \s, etc., you don't need the square brackets. But if you do use the square brackets than the +
would need to be outside. Also, the number group should be [0-9] as we are talking about individual characters, not combinations of characters. To take into account numbers higher than 9 we just expand the amount of times we check for the group with {} brackets, or simply the +
operator. The final result looks like so:
x %>%
mutate(N_all.words = str_extract_all(V1, 'A_([A-Z][0-9]{1,2})\\w+'))
Resulting to:
V1 N_all.words
1 (A_K9B,A_K9one,A_K9two,B_U10J) A_K9B, A_K9one, A_K9two
I also created a version that I found a little tidier:
x %>%
mutate(N_all.words = str_extract_all(V1, 'A_\\w\\d{1,2}\\w+'))
Upvotes: 1