Reputation: 1580
Say I have a character vector ids
as follows:
ids <- c("367025001", "CT_341796001", "M13X01692-01", "13C025050901", "13C00699551")
I want to search each element and remove all letters, all special characters, and "01" when it ends the element. So ids
would become:
ids_replaced <- c("3670250", "3417960", "1301692", "130250509", "1300699551")
I'm coming out somewhat close, but it hasn't worked as I've intended it to.
gsub("(.*?)(\\d+?)(01$)", "\\2", ids, perl = TRUE)
Upvotes: 1
Views: 605
Reputation: 4767
Using rex may make this type of task a little simpler.
ids <- c("367025001", "CT_341796001", "M13X01692-01", "13C025050901", "13C00699551")
re_substitutes(ids,
rex(non_digits %or% list("01", end)),
'',
global = TRUE)
#> [1] "3670250" "3417960" "1301692" "130250509" "1300699551"
Upvotes: 1
Reputation: 99331
You could use
gsub("01$|\\D", "", ids)
# [1] "3670250" "3417960" "1301692" "130250509" "1300699551"
identical(gsub("01$|\\D", "", ids), ids_replaced)
# [1] TRUE
Regular Expression Explanation:
01
matches "01"$
before an optional \n
, and the end of the string|
OR\D
matches non-digits (all but 0-9)Upvotes: 2
Reputation: 30985
I'm not sure how to do it in R but you can use this regex:
-\d+$|\D
Upvotes: 0