KA01
KA01

Reputation: 4251

Ruby regular expression to extract words in a string that contain no spaces

Say I have the string str = "ASimpleNoSpaceTitle". I can't seem to wrap my head around how to use regexp to split and extract all the capitalized words so that I get ["A", "Simple", "No", "Space", "Title"].

What's a regular expression that will do the job?

UPDATE: What about a string of words with and without spaces/upper-case? Like "ASimpleNoSpaceTitle and a subtitle" to ["A", "Simple", "No", "Space", "Title", "and", "a", "subtitle"]

Upvotes: 1

Views: 1475

Answers (4)

philomory
philomory

Reputation: 1767

Using String#scan with character class ranges will get you what you want with a simple, easy-to-understand regex:

str = "ASimpleNoSpaceTitle"
str.scan(/[A-Z][a-z]*/) # => ["A", "Simple", "No", "Space", "Title"]

You could use the POSIX bracket expressions [[:upper:]] and [[:lower:]], which would allow your regex to also deal with non-ASCII letters such as À or ç:

str = "ÀSimpleNoSpaçeTitle"
str.scan(/[A-Z][a-z]*/) # => ["Simple", "No", "Spa", "Title"]
str.scan(/[[:upper:]][[:lower:]]*/) # => ["À", "Simple", "No", "Spaçe", "Title"]

To allow words to begin with a lowercase letter when not preceded by another letter, you can use this varuation:

str = "ASimpleNoSpaceTitle and a subtitle"
str.scan(/[A-Za-z][a-z]*/) # => ["A", "Simple", "No", "Space", "Title", "and", "a", "subtitle"]
# OR
str.scan(/[[:alpha:]][[:lower:]]*/)

Upvotes: 6

Aleksei Matiushkin
Aleksei Matiushkin

Reputation: 121000

The correct way to do this in 2016 is:

"ASimpleNoSpaceTitle and a subtitle".split(/(?=\p{Lu})|\s+/)
#⇒ ["A","Simple","No","Space","Title","and","a","subtitle"]

Upvotes: 2

Cary Swoveland
Cary Swoveland

Reputation: 110675

"ABSimpleNoSpaceTitle".split(/(?=[[:upper:]])/)
  #=> ["A", "B", "Simple", "No", "Space", "Title"]

(?=[[:upper:]]) in a positive lookahead, requiring the match to be followed by a capital letter.

Upvotes: 4

z atef
z atef

Reputation: 7679

Here is one way to do it.

pass this regex inside the built in scan() method.

regext /[[:upper:]](?:[[:lower:]]+)?/

All the regex does is find an upper case letter [[:upper:]] that is optionally followed by a lower case letter (?:[[:lower:]]+)?.

scan will look for more than one occurrence of the match string/char..etc

irb(main):001:0> str = "ASimpleNoSpaceTitle"
=> "ASimpleNoSpaceTitle"

irb(main):050:0> str.scan(/[[:upper:]](?:[[:lower:]]+)?/)
=> ["A", "Simple", "No", "Space", "Title"]
irb(main):051:0> 

Upvotes: 0

Related Questions