xhammer
xhammer

Reputation: 145

Building a regex for weird word combination

Hello I want to create a regex to find words which have capital words in between them. Can anyone please tell me how to find it?

For example: I went to a school forEducation and then didn't study.

I want words like forEducation. I think we use [a-z][A-Z] to find all alphabets but can't really find it. I am thinking along the lines of merging the above two options, but then how is it going to check for capital alphabets? Also I wouldn't want to select any words which have capital letters in the start, because they are valid.

Reason : This is occurring because of scraping text from online websites and they have no proper formatting. I get these words as text and want to replace them.

Upvotes: 0

Views: 257

Answers (2)

Tim Pietzcker
Tim Pietzcker

Reputation: 336478

If I'm guessing correctly, you want to split words that are camelCased, right?

Then use

ResultString = Regex.Replace(SubjectString, 
    "(?<=\p{Ll})  # Assert that the previous character is a lowercase letter" & chr(10) & _
    "(?=\p{Lu})   # Assert that the next character is an uppercase letter", 
    " ", RegexOptions.IgnorePatternWhitespace)

This changes

I went to a school forEducation and then didn't study.

into

I went to a school for Education and then didn't study.

Upvotes: 1

Vigrond
Vigrond

Reputation: 8198

[a-zA-Z][a-z]*[A-Z][a-zA-Z]*

Start with any letter, followed by 0 or more lowercase letters, followed by a capital letter, followed by 0 or more letters (capitalized or not)

Upvotes: 1

Related Questions