timpone
timpone

Reputation: 19969

How to scan for substrings with specific characters in them

This is a follow-up to this question. How to scan and return a set of words with specific characters in them in Ruby

We want to scan for words starting with a certain set of letters and then return them in an array. Something like this:

 b="h ARCabc s and other ARC12".scan(/\w+ARC*\w+/)

and get back:

["ARCabc","ARC12"]

How would I do this (and I know this is very similar to what I asked yesterday)?

Upvotes: 0

Views: 86

Answers (2)

Cary Swoveland
Cary Swoveland

Reputation: 110725

For good readability, you could split the string into words and then select the ones you want:

str = "h ARCabc s and other ARC12"
target = "ARC"

str.split.select { |w| w.include?(target) }
  #=> ["ARCabc", "ARC12"] 

If the words must begin with target:

str.split.select { |w| w.start_with?(target) }

Upvotes: 1

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 627103

Just use the following regex:

\bARC\w*\b

or (to exclude underscores from matching)

\bARC[[:alnum:]]*\b

See regex demo

The regex matches:

  • \b - a word boundary (ARC at the start of a word only)
  • ARC - a fixed sequence of characters
  • \w* - 0 or more letter, digits or underscores. NOTE: if you only want to limit the matches to letters and digits, replace this \w* with [[:alnum:]]*.
  • \b - end of word (trailing) boundary.

See IDEONE demo here (output: ARCabc and ARC12).

NOTE2: If you plan to match Unicode strings, consider using either of the following regexps:

  • \bARC\p{Word}*\b - this variation will match words with underscores after ARC
  • \bARC[\p{L}\p{M}\d]*\b - this regex will match words that only have digits and Unicode letters after ARC.

Upvotes: 4

Related Questions