Reputation: 4251
Say I have the string str = "ASimpleNoSpaceTitle"
. I can't seem to wrap my head around how to use regexp to split and extract all the capitalized words so that I get ["A", "Simple", "No", "Space", "Title"]
.
What's a regular expression that will do the job?
UPDATE: What about a string of words with and without spaces/upper-case? Like "ASimpleNoSpaceTitle and a subtitle"
to ["A", "Simple", "No", "Space", "Title", "and", "a", "subtitle"]
Upvotes: 1
Views: 1475
Reputation: 1767
Using String#scan
with character class ranges will get you what you want with a simple, easy-to-understand regex:
str = "ASimpleNoSpaceTitle"
str.scan(/[A-Z][a-z]*/) # => ["A", "Simple", "No", "Space", "Title"]
You could use the POSIX bracket expressions [[:upper:]]
and [[:lower:]]
, which would allow your regex to also deal with non-ASCII letters such as À or ç:
str = "ÀSimpleNoSpaçeTitle"
str.scan(/[A-Z][a-z]*/) # => ["Simple", "No", "Spa", "Title"]
str.scan(/[[:upper:]][[:lower:]]*/) # => ["À", "Simple", "No", "Spaçe", "Title"]
To allow words to begin with a lowercase letter when not preceded by another letter, you can use this varuation:
str = "ASimpleNoSpaceTitle and a subtitle"
str.scan(/[A-Za-z][a-z]*/) # => ["A", "Simple", "No", "Space", "Title", "and", "a", "subtitle"]
# OR
str.scan(/[[:alpha:]][[:lower:]]*/)
Upvotes: 6
Reputation: 121000
The correct way to do this in 2016 is:
"ASimpleNoSpaceTitle and a subtitle".split(/(?=\p{Lu})|\s+/)
#⇒ ["A","Simple","No","Space","Title","and","a","subtitle"]
Upvotes: 2
Reputation: 110675
"ABSimpleNoSpaceTitle".split(/(?=[[:upper:]])/)
#=> ["A", "B", "Simple", "No", "Space", "Title"]
(?=[[:upper:]])
in a positive lookahead, requiring the match to be followed by a capital letter.
Upvotes: 4
Reputation: 7679
Here is one way to do it.
pass this regex inside the built in scan() method.
regext /[[:upper:]](?:[[:lower:]]+)?/
All the regex does is find an upper case letter [[:upper:]]
that is optionally followed by a lower case letter (?:[[:lower:]]+)?
.
scan will look for more than one occurrence of the match string/char..etc
irb(main):001:0> str = "ASimpleNoSpaceTitle"
=> "ASimpleNoSpaceTitle"
irb(main):050:0> str.scan(/[[:upper:]](?:[[:lower:]]+)?/)
=> ["A", "Simple", "No", "Space", "Title"]
irb(main):051:0>
Upvotes: 0