Julien TASSIN
Julien TASSIN

Reputation: 5212

Matching groups of words

I would like a regexp that match all groups of words (single words and sub-sentences) in a sentence separated by white space.

Example :

"foo bar bar2".scan(regexp)

I want a regexp that will returns :

['foo', 'bar', 'bar2', 'foo bar', 'bar bar2', 'foo bar bar2']

So far, I tried :

"foo bar bar2".scan(/\S*[\S]/) (ie regexp=/\S*/) which returns ['foo', 'bar', 'bar2']

"foo bar bar2".scan(/\S* [\S]+/) (ie regexp=/\S* [\S]+/) which returns ["foo bar", " bar2"]

Upvotes: 0

Views: 70

Answers (2)

João Medeiros
João Medeiros

Reputation: 1

Mudasobwa did a nice variation of this answer check here. I've used combine , builtin method for arrays. The procedure is almost the same:

    string = "foo bar bar2"
    groups = string.split
    objects = []

      for i in 1..groups.size
       groups = string.split.combination(i).to_a
       objects << groups
      end

     results = objects.flatten(1).map { |e| e.join('-') }
     puts results

Anyway , you can't do it with one regex.(suppose you have 50 words and need to find all the combinations; regex can't do it). You will need to iterate with the objects like Mudasobwa showed.

I would start doing this: the regex, if you want to use one, can be /([^\s]\w+)/m ; for example. This regex will match words. And by words I mean groups of characters surrounded by white-spaces.

With this you can scan your text or split your string. You can do it many ways and in the end you will have an array with the words you wanna combine.

    string = "foo bar bar2"

Then you split it, creating an array and applying to it the combination method.

  groups = string.split
    => ["foo", "bar", "bar2"]

combination method takes a number as argument, and that number will be the 'size' of the combination. combination(2) combines the elements in groups of two. 1 - groups of 1 .. 0 groups of zero! (this is why we start combinations with 1).

You need to loop and cover all possible group sizes, saving the results in a results array. :

    objects = []

use the number of elements as parameter to the loop

       for i in 1..groups.size
         groups = string.split.combination(i).to_a
         objects << groups
        end

Now you just have to finish with a loop to flatten the arrays that are inside arrays and to take out the comas and double quotes

results = objects.flatten(1).map { |e| e.join('-') }

Thats it! You can run the code above (example with more words)here https://repl.it/JLK9/1

Ps: both question and the mentioned answer are lacking a combination (foo-bar2)

Upvotes: 0

Aleksei Matiushkin
Aleksei Matiushkin

Reputation: 121000

words = "foo bar bar2".scan(/\S+/)
result = 1.upto(words.length).map do |n|
  words.each_cons(n).to_a
end.flatten(1)
#⇒ [["foo"], ["bar"], ["bar2"],
#   ["foo", "bar"], ["bar", "bar2"],
#   ["foo", "bar", "bar2"]]

result.map { |e| e.join(' ') }
#⇒ ["foo", "bar", "bar2", "foo bar", "bar bar2", "foo bar bar2"]

Here we used Enumerable#each_cons to get to the result.

Upvotes: 3

Related Questions