kjo
kjo

Reputation: 35321

How to reduce code clutter in this function?

The function tally below is really simple: it takes a string s as argument, splits it on non-alphanumeric characters, and tallies the numbers of the resulting "words", case-insensitively.

open Core.Std

let tally s =
  let get m k =
    match Map.find m k with
    | None   -> 0
    | Some n -> n
  in

  let upd m k = Map.add m ~key:k ~data:(1 + get m k) in

  let re = Str.regexp "[^a-zA-Z0-9]+" in
  let ws = List.map (Str.split re s) ~f:String.lowercase in

  List.fold_left ws ~init:String.Map.empty ~f:upd

I think this function is harder to read than it should be due to clutter. I wish I could write something closer to this (where I've indulged in some "fantasy syntax"):

(* NOT VALID SYNTAX -- DO NOT COPY !!! *)

open Core.Std

let tally s =

  let get m k =
        match find m k with
        | None   -> 0
        | Some n -> n ,

      upd m k = add m k (1 + get m k) ,

      re = regexp "[^a-zA-Z0-9]+" ,
      ws = map (split re s) lowercase 

  in fold_left ws empty upd

The changes I did above fall primarily into three groups:

  1. get rid of the repeated let ... in's, consolidated all the bindings (into a ,-separated sequence; this, AFAIK, is not valid OCaml);
  2. got rid of the ~foo:-type noise in function calls;
  3. got rid of the prefixes Str., List., etc.

Can I achieve similar effects using valid OCaml syntax?

Upvotes: 1

Views: 1564

Answers (2)

ivg
ivg

Reputation: 35210

If you have a sequence of computations on the same value, then in OCaml there is a |> operator, that takes a value from the left, and applies in to the function on the right. This can help you to "get rid of" let and in. What concerning labeled arguments, then you can get rid of them by falling back to a vanilla standard library, and make your code smaller, but less readable. Anyway, there is a small piece of sugar with labeled arguments, you can always write f ~key ~data instead of f ~key:key ~data:data. And, finally, module names can be removed either by local open syntax (let open List in ...) or by locally shorcutting it to a smaller names (let module L = List in).

Anyway, I would like to show you a code, that contains less clutter, to my opinion:

open Core.Std
open Re2.Std
open Re2.Infix
module Words = String.Map

let tally s =
  Re2.split ~/"\\PL" s |>
  List.map ~f:(fun s -> String.uppercase s, ()) |>
  Words.of_alist_multi |>
  Words.map ~f:List.length

Upvotes: 3

didierc
didierc

Reputation: 14730

Readability is difficult to achieve, it highly depends on the reader's abilities and familiarity with the code. I'll focus simply on the syntax transformations, but you could perhaps refactor the code in a more compact form, if this is what you are really looking for.

To remove the module qualifiers, simply open them beforehand:

open Str 
open Map
open List

You must open them in that order to make sure the List values you are using there are still reachable, and not scope-overridden by the Map ones.

For labelled parameters, you may omit the labels if for each function call you provide all the parameters of the function in the function signature order.

To reduce the number of let...in constructs, you have several options:

  1. Use a set of rec definitions:

    let tally s =
      let rec get m k =
         match find m k with
         | None   -> 0
         | Some n -> n
    
      and upd m k = add  m k (1 + get m k)
    
      and re = regexp "[^a-zA-Z0-9]+" 
      and ws = map lowercase (split re s)
    
    in fold_left ws empty upd
    
  2. Make multiple definitions at once:

    let tally s =
      let get, upd, ws =
        let  re = regexp "[^a-zA-Z0-9]+"  in
        fun m k ->
         match find m k with
         | None   -> 0
         | Some n -> n,
      fun g m k -> add  m k (1 + g m k),
      map lowercase (split re s)
    
    in fold_left ws empty (upd get)
    
  3. Use a module to group your definitions:

    let tally s =
      let module M = struct 
        let get m k =
          match find m k with
          | None   -> 0
          | Some n -> n
    
        let upd m k = add m k (1 + get m k)
    
        let re = regexp "[^a-zA-Z0-9]+" 
        let ws = map lowercase (split re s)
    
    end in fold_left ws empty M.upd
    

The later is reminiscent of the Sml syntax, and perhaps better suited to proper optimization by the compiler, but it only get rid of the in keywords.

Please note that since I am not familiar with the Core Api, I might have written incorrect code.

Upvotes: 3

Related Questions