Leoking938
Leoking938

Reputation: 81

Convert a string into array of words

How do I separate a string into a list/array of white space separated words.

let x = "this is my sentence";;

And store them inan list/array like this:

 ["this", "is", "my", "sentence"]

Upvotes: 2

Views: 1452

Answers (3)

Chris
Chris

Reputation: 36496

Posting from a future that involves sequences, to offer an alternative way that doesn't necessarily have to involve creating an entire list, unless you actually need that.

We can lazily iterate over a string, character by character, and use an aux function to decide when to yield a word, using an argument to that function to build up each word in turn, and to reset it after it has been yielded.

module CharSet = Set.Make (Char)

let split_words seps s =
  let rec aux seq cur () =
    match seq () with
    | Seq.Nil when cur = "" -> Seq.Nil
    | Seq.Nil -> Seq.Cons (cur, Seq.empty)
    | Seq.Cons (ch, next) ->
      let is_sep = CharSet.mem ch seps in
      if is_sep && cur = "" then 
        aux next "" ()
      else if is_sep then  
        Seq.Cons (cur, aux next "")
      else 
        aux next (Printf.sprintf "%s%c" cur ch) ()
  in
  aux (String.to_seq s) ""
# let x = "this is my sentence" in
  x
  |> split_words @@ CharSet.of_list [' '; '\t'; '\n']
  |> List.of_seq;; 
- : string list = ["this"; "is"; "my"; "sentence"]
# let x = "this is my sentence" in
  x
  |> split_words @@ CharSet.of_list [' '; '\t'; '\n']
  |> Array.of_seq;; 
- : string array = [|"this"; "is"; "my"; "sentence"|]

Upvotes: 0

user1971598
user1971598

Reputation:

The full process goes like this:

first opam install re

if you are using utop, then you can do something like this

#require "re.pcre"

let () =
  Re_pcre.split ~rex:(Re_pcre.regexp " +") "Hello world more"
  |> List.iter print_endline

and then just run it with utop code.ml

if you want to compile native code, then you'd have:

let () =
  Re_pcre.split ~rex:(Re_pcre.regexp " +") "Hello world more"
  |> List.iter print_endline

Notice how the #require is gone.

then at command line you'd do: ocamlfind ocamlopt -package re.pcre code.ml -linkpkg -o Test

The OCaml website has tons of tutorials and help, I also have a blog post designed to get you up to speed quickly: http://hyegar.com/2015/10/20/so-youre-learning-ocaml/

Upvotes: 1

BWStearns
BWStearns

Reputation: 2706

Using the standard library Str split_delim and the regexp type.

Str.split_delim (Str.regexp " ") "this is my sentence";;
- : bytes list = ["this"; "is"; "my"; "sentence"] 

Highly recommend getting UTop, it's really good for quickly searching through Libraries (I typed Str, saw it was there, then Str. and looked for the appropriate function).

Upvotes: 2

Related Questions