Reputation: 2421
There is sku name in below dataframe, I want to remove the part which start with 'V' and end with 'b', my code str_remove_all(sku_name,"^(V).*?(\\b)$")
can't work.
Anyone can help?
mydata <- data.frame(sku_name=c('wk0001 V1b','123780 PRO V326b','ttttt V321b'))
mydata %>% mutate(sku_name_new=str_remove_all(sku_name,"^(V).*?(\\b)$"))
Upvotes: 2
Views: 1032
Reputation: 549
You were actually really close.
Fix the regex using one alternative mentioned by @2evans and it's done !
I share the code using dplyr
pipe lines because it can be better for you.
mydata <- data.frame(sku_name=c('wk0001 V1b','123780 PRO V326b','ttttt V321b'))
mydata %>% mutate(sku_name_new=str_remove_all(sku_name,"V.*b$"))
sku_name sku_name_new
1 wk0001 V1b wk0001
2 123780 PRO V326b 123780 PRO
3 ttttt V321b ttttt
Upvotes: 0
Reputation: 1925
You can do it with this pattern:
vector <- c('wk0001 V1b','123780 PRO V326b','ttttt V321b')
# if only numbers can be between the "V" and "b".
stringr::str_remove(vector , "V\\d+b")
# if any character can be between the "V" and "b", but at least one and no "V" or "b".
stringr::str_remove(vector , "V[^Vb]+b")
Upvotes: 1
Reputation: 160417
vec <- c('wk0001 V1b','123780 PRO V326b','ttttt V321b')
sub("V.*b$", "", vec)
# [1] "wk0001 " "123780 PRO " "ttttt "
stringr::str_remove(vec, "V.*b$")
# [1] "wk0001 " "123780 PRO " "ttttt "
This also works with the non-greedy "V.*?b$"
, over to you if that's necessary.
BTW: \\b
is a word-boundary, not the literal b
. (V)
is saving it as a group, that's not necessary (and looks a little confusing). The real culprit is that you included ^
, which means start of string (as you mentioned), which will only match if all strings start with V
, and in "Vsomethingb"
. The current vec
strings start with "w"
, "1"
, and "t"
, none of them start with V
.
If you need a guide for regex, https://stackoverflow.com/a/22944075/3358272 is a good guide of many components (and links to questions/answers about them).
Upvotes: 5