Wolfgang
Wolfgang

Reputation: 55

How to use gsub() to change an entire string when containing some character

df <- data.frame(
  sample = c("A", "B", "C", "D"), 
  class = c("A31", "A3", "A31,C", "A3,B"),
  value = c(5,1,6,8) )

I would like to in this df replace all the strings containing A3 (e.g. A3,B) to just A3 without affecting A31 and A31,C?

I’ve tried df$class <- gsub("A3.*", "A3", df$class), but this also changes A31 and A31,C.

Upvotes: 0

Views: 123

Answers (1)

akrun
akrun

Reputation: 886948

We could capture the upper case letter ([A-Z]) followed by a digit as a group ((...)) that precedes one or more digits (\\d+) till the end ($) of the string and replace with the backreference (\\1) of the captured group

library(dplyr)
library(stringr)
df <- df %>%
    mutate(class = str_replace(class, "^([A-Z]\\d)\\d+$", "\\1"))

-output

df
  sample class value
1      A    A3     5
2      B    A3     1
3      C A31,C     6
4      D  A3,B     8

Upvotes: 1

Related Questions