Palash Nigam
Palash Nigam

Reputation: 2012

ocaml Str.full_split does not returns the original string instead of the expected substring

I am trying to write a program that will read diff files and return the filenames, just the filenames. So I wrote the following code

open Printf
open Str
let syname: string = "diff --git a/drivers/usc/filex.c b/drivers/usc/filex"

let fileb = 
  let pat_filename = Str.regexp "a\/(.+)b" in
  let s = Str.full_split pat_filename syname in
  s

let print_split_res (elem: Str.split_result) =
  match elem with
  | Text t -> print_string t
  | Delim d -> print_string d

let rec print_list (l: Str.split_result list) =
  match l with
  | [] -> ()
  | hd :: tl -> print_split_res hd ; print_string "\n" ; print_list tl
;;

() = print_list fileb

upon running this I get the original sting diff --git a/drivers/usc/filex.c b/drivers/usc/filex back as the output.

Whereas if I use the same regex pattern with the python standard library I get the desired result

import re
p=re.compile('a\/(.+)b')
p.findall("diff --git a/drivers/usc/filex.c b/drivers/usc/filex")

Output: ['drivers/usc/filex.c ']

What am I doing wrong?

Upvotes: 0

Views: 73

Answers (1)

Jeffrey Scofield
Jeffrey Scofield

Reputation: 66818

Not to be snide, but the way to understand OCaml regular expressions is to read the documentation, not compare to things in another language :-) Sadly, there is no real standard for regular expressions across languages.

The main problem appears to be that parentheses in OCaml regular expressions match themselves. To get grouping behavior they need to be escaped with '\\'. In other words, your pattern is looking for actual parentheses in the filename. Your code works for me if you change your regular expression to this:

Str.regexp "a/\\(.+\\)b"

Note that the backslashes must themselves be escaped so that Str.regexp sees them.

You also have the problem that your pattern doesn't match the slash after b. So the resulting text will start with a slash.

As a side comment, I also removed the backslash before /, which is technically not allowed in an OCaml string.

Upvotes: 2

Related Questions