Brandon Watson
Brandon Watson

Reputation: 1775

RegEx - Matching a set of words

I've been on this one for a while and can't seem to work it out. Here's what I am trying to do. Given three words word1, word2 and word3, I would like to construct a regex which will match them in that order, but with a set of potential words between them (except a new line).

For example, if I had the following:

word1 = what
word2 = the
word3 = hell

I would like to match the following strings, with a single match:

"what the hell"
"what in the hell"
"what the effing hell"
"what in the 9 doors of hell"

I thought I could do the following (allowing for 0 to 5 words to exist between each word variable):

regex = "\bword1(\b\w+\b){0,5}word2(\b\w+\b){0,5}word3\b"

Alas, no, it doesn't work. It's important that I have a way to specify a m to n word distance between words (where m always < n).

Upvotes: 3

Views: 5262

Answers (3)

Chris
Chris

Reputation: 3050

Works for me in clojure:

(def phrases ["what the hell" "what in the hell" "what the effing hell"
              "what in the 9 doors of hell"])

(def regexp #"\bwhat(\s*\b\w*\b\s*){0,5}the(\s*\b\w*\b\s*){0,5}hell")

(defn valid? []
  (every? identity (map #(re-matches regexp %) phrases)))

(valid?)  ; <-- true

as per Ben Hughes' pattern.

Upvotes: 0

Greg Bacon
Greg Bacon

Reputation: 139441

$ cat try
#! /usr/bin/perl

use warnings;
use strict;

my @strings = (
  "what the hell",
  "what in the hell",
  "what the effing hell",
  "what in the 9 doors of hell",
  "hello",
  "what the",
  " what the hell",
  "what the hell ",
);

for (@strings) {
  print "$_: ", /^what(\s+\w+){0,5}\s+the(\s+\w+){0,5}\s+hell$/
                  ? "match\n"
                  : "no match\n";
}

$ ./try
what the hell: match
what in the hell: match
what the effing hell: match
what in the 9 doors of hell: match
hello: no match
what the: no match
 what the hell: no match
what the hell : no match

Upvotes: 1

Ben Hughes
Ben Hughes

Reputation: 14185

"\bwhat(\s*\b\w*\b\s*){0,5}the(\s*\b\w*\b\s*){0,5}hell" works for me (in Ruby)

list = ["what the hell", "what in the hell", "what the effing hell", 
  "what in the 9 doors of hell", "no match here hell", "what match here hell"]

list.map{|i| /\bwhat(\s*\b\w*\b\s*){0,5}the(\s*\b\w*\b\s*){0,5}hell/.match(i) }
=> [#<MatchData:0x12c4d1c>, #<MatchData:0x12c4d08>, #<MatchData:0x12c4cf4>,
   #<MatchData:0x12c4ce0>, nil, nil]

Upvotes: 2

Related Questions