Quincy Hsieh
Quincy Hsieh

Reputation: 304

The most convenient way to check if a string ends with some text in OCaml?

Hi I've been checking on the internet to find a good way to implement "whether a string ends with certain text" in OCaml and I found that to manipulate string in OCaml is not as trivial as I expected compared to other programming language like Java.

Here is my OCaml code using Str.regexp to check if the file name ends with ".ml" to see if it is an OCaml script file. It does not work as I expected though:

let r = Str.regexp "*\\.ml" in
if (Str.string_match r file 0)
  then
    let _ = print_endline ("Read file: "^full_path) in
    readFile full_path
  else
    print_endline (full_path^" is not an OCaml file")

Note that readFile is a function written by myself to read the file from constructed full_path. I always got results in the output such as

./utilities/dict.ml is not an OCaml file
./utilities/dict.mli is not an OCaml file
./utilities/error.ml is not an OCaml file
./utilities/error.mli is not an OCaml file

What is wrong with my regexp in OCaml and is there a better/simpler code for checking string?

Upvotes: 5

Views: 3459

Answers (2)

nomaddo
nomaddo

Reputation: 446

Probably, you are confused with two styles of regular expressions:

  • Glob (like regexp in bash or other shells)
    You know, * matches empty string or a sequence of any characters in this style.
  • Posix (same as this case)

You need to check the document of str carefully.
http://caml.inria.fr/pub/docs/manual-ocaml/libref/Str.html

This says . : Matches any character except newline * : Matches the preceding expression zero, one or several times

You see, str library adopts latter style. So, to define Str.regexp, you need to write like

let r = Str.regexp ".*\.ml";;
val r : Str.regexp = <abstr>

Str.string_match r "fuga.ml" 0;;
- : bool = true

Str.string_match r "fugaml" 0;;
- : bool = false

Str.string_match r "piyo/null/fuga.ml" 0;;
- : bool = true

If you want to use glob style regular expressions,
you can use re.

In my opinion, you don't need to use a regexp to solve your problem.
Just judge whether the input includes substring ".ml" via appropriate functions.

Upvotes: 3

ivg
ivg

Reputation: 35210

First of all your regexp is incorrect, you forgot . before the *, the correct version is:

let r = Str.regexp {|.*\.ml|}

Note the usage of a new string literal syntax, that allows you to write regex in a nicer way without tons of backslashes. Using a regular syntax, with double quotes, it should look like this:

let r = Str.regexp ".*\\.ml"

This regular expression is not ideal, as it will match with file.mlx, file.ml.something.else, etc. So, a better version, that will match with all possible OCaml source file names, is

let r = Str.regexp {|.*\.ml[ily]?$|}

Instead of using regexp you can also use Filename module from the standard library, that has a check_suffix function:

let is_ml file = Filename.check_suffix file ".ml"

To check all possible extensions:

let srcs = [".ml"; ".mli"; ".mly"; ".mll"]
let is_ocaml file = List.exists (Filename.check_suffix file) srcs

Upvotes: 8

Related Questions