Richie Cotton
Richie Cotton

Reputation: 121077

Match regular expression case insensitively, replace with specific case

I'm using regular expressions to replace some substrings. The replacement value reuses part of the match. I want to match case insensitively, but in the replacement, I want a lower case version of the thing that was matched.

library(stringi)
x <- "CatCATdog"
rx <- "(?i)(cat)(?-i)"
stri_replace_all_regex(x, rx, "{$1}")
# [1] "{Cat}{CAT}dog"

This is close to what I want, except the "cat"s should be lower case. That is, the output string should be "{cat}{cat}dog".

The following code doesn't work, but it shows my intension.

stri_replace_all_regex(x, rx, "{tolower($1)}") 

The following technique does work, but it's ugly, not very generalizable, and not very efficient. My idea was to replace the regular expression with one that matches what I want, but not the replacement values (that is, "cat" but not "{cat}"). Then search for the first match in each input string, find the location of the match, do a substring replacement, then look for the next match until there are no more. It's awful.

x <- "CatCATdog"
rx <- "(?i)((?<!\\{)cat(?!\\}))(?-i)"
repeat{
  detected <- stri_detect_regex(x, rx)
  if(!any(detected))
  {
    break
  }
  index <- stri_locate_first_regex(x[detected], rx)
  match <- tolower(stri_match_first_regex(x[detected], rx)[, 2])
  stri_sub(x[detected], index[, 1], index[, 2]) <- paste0("{", match[detected], "}")
}

I feel like there must be a better way.

How do I replace case insensitive matches with lower case values?


Thanks to inspiration from the comments, I discovered that the thing I'm looking for is "replacement text case conversion".

Upvotes: 2

Views: 187

Answers (2)

user2957945
user2957945

Reputation: 2413

You can use \\L to change the case of the match to lower

gsub(rx, "{\\L\\1}", x, perl=TRUE) 

Upvotes: 4

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626871

If you need to perform any kind of string manipulation you may use gsubfn:

> library(gsubfn)
> rx <- "(?i)cat"
> s = "CatCATdog"
> gsubfn(rx, ~ paste0("{",tolower(x),"}"), s, backref=0)
[1] "{cat}{cat}dog"

You can use the gsubfn as you would use an anonymous callback method inside String#replace in JavaScript (you may specify the arguments for capturing groups with function(args), and also make more sophisticated manipulations inside).

Upvotes: 5

Related Questions