Reputation: 5212
I would like a regexp that match all groups of words (single words and sub-sentences) in a sentence separated by white space.
Example :
"foo bar bar2".scan(regexp)
I want a regexp
that will returns :
['foo', 'bar', 'bar2', 'foo bar', 'bar bar2', 'foo bar bar2']
So far, I tried :
"foo bar bar2".scan(/\S*[\S]/)
(ie regexp=/\S*/
)
which returns ['foo', 'bar', 'bar2']
"foo bar bar2".scan(/\S* [\S]+/)
(ie regexp=/\S* [\S]+/
)
which returns ["foo bar", " bar2"]
Upvotes: 0
Views: 70
Reputation: 1
Mudasobwa did a nice variation of this answer check here. I've used combine , builtin method for arrays. The procedure is almost the same:
string = "foo bar bar2"
groups = string.split
objects = []
for i in 1..groups.size
groups = string.split.combination(i).to_a
objects << groups
end
results = objects.flatten(1).map { |e| e.join('-') }
puts results
Anyway , you can't do it with one regex.(suppose you have 50 words and need to find all the combinations; regex can't do it). You will need to iterate with the objects like Mudasobwa showed.
I would start doing this: the regex, if you want to use one, can be /([^\s]\w+)/m ; for example. This regex will match words. And by words I mean groups of characters surrounded by white-spaces.
With this you can scan your text or split your string. You can do it many ways and in the end you will have an array with the words you wanna combine.
string = "foo bar bar2"
Then you split it, creating an array and applying to it the combination method.
groups = string.split
=> ["foo", "bar", "bar2"]
combination method takes a number as argument, and that number will be the 'size' of the combination. combination(2) combines the elements in groups of two. 1 - groups of 1 .. 0 groups of zero! (this is why we start combinations with 1).
You need to loop and cover all possible group sizes, saving the results in a results array. :
objects = []
use the number of elements as parameter to the loop
for i in 1..groups.size
groups = string.split.combination(i).to_a
objects << groups
end
Now you just have to finish with a loop to flatten the arrays that are inside arrays and to take out the comas and double quotes
results = objects.flatten(1).map { |e| e.join('-') }
Thats it! You can run the code above (example with more words)here https://repl.it/JLK9/1
Ps: both question and the mentioned answer are lacking a combination (foo-bar2)
Upvotes: 0
Reputation: 121000
words = "foo bar bar2".scan(/\S+/)
result = 1.upto(words.length).map do |n|
words.each_cons(n).to_a
end.flatten(1)
#⇒ [["foo"], ["bar"], ["bar2"],
# ["foo", "bar"], ["bar", "bar2"],
# ["foo", "bar", "bar2"]]
result.map { |e| e.join(' ') }
#⇒ ["foo", "bar", "bar2", "foo bar", "bar bar2", "foo bar bar2"]
Here we used Enumerable#each_cons
to get to the result.
Upvotes: 3