Reputation: 3133
I have an array which contains tags, usually simple english words, about 3-6 element. I have to select the lines from a text file which contains ALL the tags in any order (lower or upper case doesn't count, case insensitive). How can I achieve this in Ruby? Should I use regex-es or any different approach?
For example I know how to logical OR regex patterns /tag1|tag2|tag3/ Is it possible in any way to logical AND them? /tag1 & tag2 & tag3/ ?
Upvotes: 1
Views: 105
Reputation: 41848
Yes. To AND the tags, use lookaheads after the beginning of string anchor ^
:
^(?=.*tag1)(?=.*tag2)(?=.*tag3).*
You can assemble this regex programmatically by looping through your array.
Upvotes: 5
Reputation: 37527
A non-regex approach is:
tags.all? {|tag| string.include? tag}
For case insensitivity, assume string
is a downcased line. and tags are already downcased.
Regular expressions are more flexible; they can be configured to match on word boundaries, etc.
Upvotes: 1
Reputation: 110755
This is one way you could do it.
Code
def line_contains_tags(str, tags)
str.scan(/(?:^|\s)(#{tags.join('|')})(?=\s|$)/)
.flatten(1)
.uniq.size == tags.size
end
Examples
tags = %w{tag1 tag2 tag3}
line_contains_tags("tag1 tag2 tag3", tags) #=> true
line_contains_tags("tag2 tag1 tag3", tags) #=> true
line_contains_tags("tag1 tag3" , tags) #=> false
line_contains_tags("tag1 tag1 tag3", tags) #=> false
Explanation
The regex scans the string for each element of tags
until it finds a match or concludes there is no match. A match is the element of tags
that is preceded by the beginning of the string or a whitespace character and is followed by a zero-length (postive lookahead) string consisting of a whitespace character or the end of the string.
tags = %w{tag1 tag2 tag3}
#=> ["tag1", "tag2", "tag3"]
regex = /(?:^|\s)(#{tags.join('|')})(?=\s|$)/
#=> /(?:^|\s)(tag1|tag2|tag3)(?=\s|$)/
str = "tag1 tag2 tag3"
a = str.scan(regex) #=> [["tag1"], ["tag2"], ["tag3"]]
b = a.flatten(1).uniq #=> ["tag1", "tag2", "tag3"]
b.size == 3 #=> true
For the last example,
str = "tag1 tag1 tag3"
a = str.scan(r).flatten(1).uniq #=> ["tag1", "tag3"]
a.size == 3 #=> false
Upvotes: 1