mcjudd
mcjudd

Reputation: 1580

R - Use regex to remove all strings, special characters, and pattern ending element

Say I have a character vector ids as follows:

ids <- c("367025001", "CT_341796001", "M13X01692-01", "13C025050901", "13C00699551")

I want to search each element and remove all letters, all special characters, and "01" when it ends the element. So ids would become:

ids_replaced <- c("3670250", "3417960", "1301692", "130250509", "1300699551")

I'm coming out somewhat close, but it hasn't worked as I've intended it to.

gsub("(.*?)(\\d+?)(01$)", "\\2", ids, perl = TRUE)

Upvotes: 1

Views: 605

Answers (3)

Jim
Jim

Reputation: 4767

Using rex may make this type of task a little simpler.

ids <- c("367025001", "CT_341796001", "M13X01692-01", "13C025050901", "13C00699551")

re_substitutes(ids,
  rex(non_digits %or% list("01", end)),
  '',
  global = TRUE)

#> [1] "3670250"    "3417960"    "1301692"    "130250509"  "1300699551"

Upvotes: 1

Rich Scriven
Rich Scriven

Reputation: 99331

You could use

gsub("01$|\\D", "", ids)
# [1] "3670250"    "3417960"    "1301692"    "130250509"  "1300699551"
identical(gsub("01$|\\D", "", ids), ids_replaced)
# [1] TRUE

Regular Expression Explanation:

  • 01 matches "01"
  • $ before an optional \n, and the end of the string
  • | OR
  • \D matches non-digits (all but 0-9)

Upvotes: 2

Federico Piazza
Federico Piazza

Reputation: 30985

I'm not sure how to do it in R but you can use this regex:

-\d+$|\D

Working demo

enter image description here

Upvotes: 0

Related Questions