Lex
Lex

Reputation: 891

Trying to space words using Regex

I have a regex that is able to space words correctly, however, if something has a capitalized shortcode, it will not work.

what I'm trying to do is turn something like "TSTApplicationType" into TST Application Type".

Currently, I'm using Regex.Replace(value, "([a-z])_?([A-Z])", "$1 $2") to add the spaces to the words, however this just turns it into "TSTApplication Type".

Upvotes: 1

Views: 46

Answers (2)

Kao
Kao

Reputation: 106

If you don't mind using Humanizer they also have this as well when you try to do .Humanize() on a string. This however doesn't preserve casing, but would be another option if you actually had wanted to change the casing.

"TSTApplicationType".Humanize(LetterCasing.Title); // TST Application Type

Upvotes: 0

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 627410

You may use either of the two:

// Details on Approach 1
Regex.Replace(text, @"\p{Lu}{2,}(?=\p{Lu})|(?>\p{Lu}\p{Ll}*)(?!$)", "$& ")
// Details on Approach 2
Regex.Replace(text, @"(?<=\p{Lu})(?=\p{Lu}\p{Ll})|(?<=\p{Ll})(?=\p{Lu})", " ")

See regex demo #1 and regex demo #2

Details on Approach 1

  • \p{Lu}{2,}(?=\p{Lu})|(?>\p{Lu}\p{Ll}*)(?!$) matches
    • \p{Lu}{2,}(?=\p{Lu}) - 2 or more uppercase letters followed with an uppercase letter
    • | - or
    • (?>\p{Lu}\p{Ll}*)(?!$) - an uppercase letter and then 0 or more lowercase letters not at the end of string.
  • The replacement is the whole match (referenced with $&) and a space.

Details on Approach 2

This is a common approach that is basically inserting a space in between an uppercase letter and an uppercase letter followed with a lowercase letter ((?<=\p{Lu})(?=\p{Lu}\p{Ll})) or (|) between a lowercase letter and an uppercase letter (see (?<=\p{Ll})(?=\p{Lu})).

Upvotes: 2

Related Questions