Reputation: 59
I'm trying to write a lexer for a variation on C using OCaml. For the lexer I need to match the strings "^" and "||" (as the exponentiation and or symbols respectively). Both of these are special characters in regex, and when I try to escape them using the backslash, nothing changes and the code runs as if "\^" was still beginning of line and "\|\|" was still "or or". What can I do to fix this?
Upvotes: 2
Views: 1995
Reputation: 3970
Backslash characters in string literals have to be doubled to make them past the OCaml string parser:
# let r = Str.regexp "\\^" in
Str.search_forward r "FOO^BAR" 0;;
- : int = 3
If you are using OCaml 4.02 or later, you can also use quoted strings ({| ... |}
), which do not handle a backslash character specially. This may result in more readable code because backslash characters do not have to be doubled:
# let r = Str.regexp {|\^|} in
Str.search_forward r "FOO^BAR" 0;;
- : int = 3
Or you may consider using Str.regexp_string
(or Str.quote
), which creates a regular expression that will match all characters in its argument literally:
# let r = Str.regexp_string "^" in
Str.search_forward r "FOO^BAR" 0;;
- : int = 3
The Str
module does not take |
as a special regex character, so you do not have to worry about quoting when you want to use it literally:
# let r = Str.regexp "||" in
Str.search_forward r "FOO||BAR" 0;;
- : int = 3
|
has to be quoted only when you want to use it as the "or" construct:
# let r = Str.regexp "BAZ\\|BAR" in
Str.search_forward r "FOOBAR" 0;;
- : int = 3
You might want to refer to Str.regexp for the full syntax of regular expressions.
Upvotes: 7