user3685285
user3685285

Reputation: 6586

R gsub regex Pascal Case to Camel Case

I want to write a gsub function using R regexes to replace all capital letters in my string with underscore and the lower case variant. In a seperate gsub, I want to replace the first letter with the lowercase variant. The function should do something like this:

pascal_to_camel("PaymentDate") -> "payment_date"
pascal_to_camel("AccountsOnFile") -> "accounts_on_file"
pascal_to_camel("LastDateOfReturn") -> "last_date_of_return"

The problem is, I don't know how to tolower a "\\1" returned by the regex.

I have something like this:

name_format = function(x) gsub("([A-Z])", paste0("_", tolower("\\1")), gsub("^([A-Z])", tolower("\\1"), x))

But it is doing tolower on the string "\\1" instead of on the matched string.

Upvotes: 1

Views: 533

Answers (2)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626893

You may use the following solution (converted from Python, see the Elegant Python function to convert CamelCase to snake_case? post):

> pascal_to_camel <- function(x) tolower(gsub("([a-z0-9])([A-Z])", "\\1_\\2", gsub("(.)([A-Z][a-z]+)", "\\1_\\2", x)))
> pascal_to_camel("PaymentDate")
[1] "payment_date"
> pascal_to_camel("AccountsOnFile")
[1] "accounts_on_file"
> pascal_to_camel("LastDateOfReturn")
[1] "last_date_of_return"

Explanation

  • gsub("(.)([A-Z][a-z]+)", "\\1_\\2", x) is executed first to insert a _ between any char followed with an uppercase ASCII letter followed with 1+ ASCII lowercase letters (the output is marked as y in the bullet point below)
  • gsub("([a-z0-9])([A-Z])", "\\1_\\2", y) - inserts _ between a lowercase ASCII letter or a digit and an uppercase ASCII letter (result is defined as z below)
  • tolower(z) - turns the whole result to lower case.

The same regex with Unicode support (\p{Lu} matches any uppercase Unicode letter and \p{Ll} matches any Unicode lowercase letter):

pascal_to_camel_uni <- function(x) {
     tolower(gsub("([\\p{Ll}0-9])(\\p{Lu})", "\\1_\\2", 
         gsub("(.)(\\p{Lu}\\p{Ll}+)", "\\1_\\2", x, perl=TRUE), perl=TRUE))
}
pascal_to_camel_uni("ДеньОплаты")
## => [1] "день_оплаты"

See this online R demo.

Upvotes: 1

Srdjan M.
Srdjan M.

Reputation: 3405

Using two regex ([A-Z]) and (?!^[A-Z])([A-Z]), perl = TRUE, \\L\\1 and _\\L\\1:

name_format <- function(x) gsub("([A-Z])", perl = TRUE, "\\L\\1", gsub("(?!^[A-Z])([A-Z])", perl = TRUE, "_\\L\\1", x))
> name_format("PaymentDate")
[1] "payment_date"
> name_format("AccountsOnFile")
[1] "accounts_on_file"
> name_format("LastDateOfReturn")
[1] "last_date_of_return"

Code demo

Upvotes: 1

Related Questions