Alex Blanck
Alex Blanck

Reputation: 5

What is the best way to ensure a regex in OCaml matches the entire input string?

In OCaml, I'm trying to check if a regex matches the entire input string, not just a prefix or a suffix or the potion of the input string before the first newline.

For example, I want to avoid a regex of [0-9]+ matching against strings like these:

let negative_matches = [
    "  123"; (* leading whitespace *)
    "123  "; (* trailing whitespace *)
    "123\n"; (* trailing newline *)
]

I see that Str.string_match still returns true when trailing characters do not match the pattern:

# List.map (fun s -> Str.string_match (Str.regexp "[0-9]+") s 0) negative_matches;;
- : bool list = [false; true; true]

Adding $ to the pattern helps in the second example, but $ is documented to only "match at the end of the line", so the third example still matches

# List.map (fun s -> Str.string_match (Str.reg  exp "[0-9]+$") s 0) negative_matches;;
- : bool list = [false; false; true]

I don't see a true "end of string" matcher (like \z in Java and Ruby) documented, so the best answer I've found is to additionally check the length of the input string against the length of the match using Str.match_end:

# List.map (fun s -> Str.string_match (Str.reg  exp "[0-9]+") s 0 && Str.match_end () = String.length s) negative_matches;;
- : bool list = [false; false; false]

Please tell me I'm missing something obvious and there is an easier way.

Edit: note that I'm not always looking to match against a simple regex like [0-9]+. I'd like a way to match an arbitrary regex against the entire input string.

Upvotes: 0

Views: 555

Answers (2)

MikeM
MikeM

Reputation: 13631

You are missing something obvious. There is an easier way. If

[^0-9]

is matched in the input string you will know it contains a non-digit character.


Unfortunately, I don't think Str offers a better way to ensure the whole string has been matched than your own solution, or the similar, slightly clearer alternative:

Str.string_match (Str.regexp "[0-9]+") s 0 && Str.matched_string s = s

Or you could just check for the presence of a newline character as that is the fly in the ointment as you show.

And, of course, there are other regular expression libraries available that do not have this problem.

Upvotes: 4

Ahmed Laoun
Ahmed Laoun

Reputation: 103

try this for your example

(?<![^A-z]|\w)[0-9]+(?![^A-z]|\w)

test it here if you want to generate other patterns you can start by knowing this

(?<!'any group you don't want it to appear before your desire')

(?!'any group you don't want it to appear after your desire')

Upvotes: 0

Related Questions