Konstantin
Konstantin

Reputation: 3133

How to match lines with multiple regex-es at once?

I have an array which contains tags, usually simple english words, about 3-6 element. I have to select the lines from a text file which contains ALL the tags in any order (lower or upper case doesn't count, case insensitive). How can I achieve this in Ruby? Should I use regex-es or any different approach?

For example I know how to logical OR regex patterns /tag1|tag2|tag3/ Is it possible in any way to logical AND them? /tag1 & tag2 & tag3/ ?

Upvotes: 1

Views: 105

Answers (3)

zx81
zx81

Reputation: 41848

Yes. To AND the tags, use lookaheads after the beginning of string anchor ^:

^(?=.*tag1)(?=.*tag2)(?=.*tag3).*

You can assemble this regex programmatically by looping through your array.

Upvotes: 5

Mark Thomas
Mark Thomas

Reputation: 37527

A non-regex approach is:

tags.all? {|tag| string.include? tag}

For case insensitivity, assume string is a downcased line. and tags are already downcased.

Regular expressions are more flexible; they can be configured to match on word boundaries, etc.

Upvotes: 1

Cary Swoveland
Cary Swoveland

Reputation: 110755

This is one way you could do it.

Code

def line_contains_tags(str, tags)
    str.scan(/(?:^|\s)(#{tags.join('|')})(?=\s|$)/)
       .flatten(1)
       .uniq.size == tags.size
end

Examples

tags = %w{tag1 tag2 tag3}
line_contains_tags("tag1 tag2 tag3", tags) #=> true
line_contains_tags("tag2 tag1 tag3", tags) #=> true
line_contains_tags("tag1 tag3"     , tags) #=> false
line_contains_tags("tag1 tag1 tag3", tags) #=> false

Explanation

The regex scans the string for each element of tags until it finds a match or concludes there is no match. A match is the element of tags that is preceded by the beginning of the string or a whitespace character and is followed by a zero-length (postive lookahead) string consisting of a whitespace character or the end of the string.

tags = %w{tag1 tag2 tag3}
  #=> ["tag1", "tag2", "tag3"]
regex = /(?:^|\s)(#{tags.join('|')})(?=\s|$)/
  #=> /(?:^|\s)(tag1|tag2|tag3)(?=\s|$)/

str = "tag1 tag2 tag3"
a = str.scan(regex)             #=> [["tag1"], ["tag2"], ["tag3"]]
b = a.flatten(1).uniq           #=> ["tag1", "tag2", "tag3"]
b.size == 3                     #=> true

For the last example,

str = "tag1 tag1 tag3"
a = str.scan(r).flatten(1).uniq #=> ["tag1", "tag3"]
a.size == 3                     #=> false

Upvotes: 1

Related Questions