Reputation: 594
I have this block of text:
XQuery programming language
C# programming language
declarative programming
XSLT programming language
Haskell programming language vs F* programming language
I want to retrieve the names of the programming languages.
I tried something like
matches = string.scan('/(\w)*\sprogramming language/i')
But that gives me this:
[]
[]
[]
[]
Whereas I want an array like this:
['XQuerye','C#','XSLT','Haskell']
What am I doing wrong?
Upvotes: 3
Views: 793
Reputation: 110675
You need only make a couple of small changes to what you have. I've assumed the text you want always starts at the beginning of a line (because you've excluded 'F*'
) and is separated from "programming language"
by one or more spaces.
text =<<_
XQuery programming language
C# programming language
declarative programming
XSLT programming language
Haskell programming language vs F* programming language
_
text.scan(/(^.+?)\s+programming language/i).flatten
#=> ["XQuery", "C#", "XSLT", "Haskell"]
Notes:
^
in the regex is the beginning-of-line anchor. It needs to be inside the capture group (^.+)
. If we had ^(.+)
, nil
would be returned by scan
for the third line. The first ?
in the regex makes .+
"non-greedy". Without it, the last element of the array returned would be:
"Haskell programming language vs F*"
Upvotes: 1
Reputation: 174696
You must need to remove the quotes around the regex delimiter /
string.scan(/\S+(?=\sprogramming language)/i)
\S+
matches one or more non-space characters. (?=\sprogramming language)
Positive lookahead which asserts that the match must be followed by a space and a programming language
string. i
modifier makes the regex engine to do a case-insensitive match.
irb(main):001:0> str = "XQuery programming language
irb(main):002:0" C# programming language
irb(main):003:0" declarative programming
irb(main):004:0" XSLT programming language
irb(main):005:0" Haskell programming language vs F* programming language"
=> "XQuery programming language\nC# programming language\ndeclarative programming\nXSLT programming language\nHaskell programming language vs F* programming language"
irb(main):007:0> str.scan(/\S+(?=\sprogramming language)/i)
=> ["XQuery", "C#", "XSLT", "Haskell", "F*"]
Upvotes: 6