Reputation: 835
I have a tiny grammar represented as a variant type term
with strings that are tokens/part of tokens (type term).
Given expressions from the grammar, I am collecting all strings from expressions and pack them into sets (function vars
). Finally, I want to create some graph with these sets as vertices (lines 48-49).
For some reason, the graph created in the such sophisticated way does not recognise sets containing same variables and creates multiple vertices with the same content. I don't really understand why this is happening.
Here is minimal working example with this behaviour:
(* demo.ml *)
type term =
| Var of string
| List of term list * string option
| Tuple of term list
module SSet = Set.Make(
struct
let compare = String.compare
type t = string
end)
let rec vars = function
| Var v -> SSet.singleton v
| List (x, tail) ->
let tl = match tail with
| None -> SSet.empty
| Some var -> SSet.singleton var in
SSet.union tl (List.fold_left SSet.union SSet.empty (List.map vars x))
| Tuple x -> List.fold_left SSet.union SSet.empty (List.map vars x)
module Node = struct
type t = SSet.t
let compare = SSet.compare
let equal = SSet.equal
let hash = Hashtbl.hash
end
module G = Graph.Imperative.Digraph.ConcreteBidirectional(Node)
(* dot output for the graph for illustration purposes *)
module Dot = Graph.Graphviz.Dot(struct
include G
let edge_attributes _ = []
let default_edge_attributes _ = []
let get_subgraph _ = None
let vertex_attributes _ = []
let vertex_name v = Printf.sprintf "{%s}" (String.concat ", " (SSet.elements v))
let default_vertex_attributes _ = []
let graph_attributes _ = []
end)
let _ =
(* creation of two terms *)
let a, b = List ([Var "a"], Some "b"), Tuple [Var "a"; Var "b"] in
(* get strings from terms packed into sets *)
let avars, bvars = vars a, vars b in
let g = G.create () in
G.add_edge g avars bvars;
Printf.printf "The content is the same: [%s] [%s]\n"
(String.concat ", " (SSet.elements avars))
(String.concat ", " (SSet.elements bvars));
Printf.printf "compare/equal output: %d %b\n"
(SSet.compare avars bvars)
(SSet.equal avars bvars);
Printf.printf "Hash values are different: %d %d\n"
(Hashtbl.hash avars) (Hashtbl.hash bvars);
Dot.fprint_graph Format.str_formatter g;
Printf.printf "Graph representation:\n%s" (Format.flush_str_formatter ())
In order to compile, type ocamlc -c -I +ocamlgraph demo.ml; ocamlc -I +ocamlgraph graph.cma demo.cmo
. When the program is executed you get this output:
The content is the same: [a, b] [a, b]
compare/equal output: 0 true
Hash values are different: 814436103 1017954833
Graph representation:
digraph G {
{a, b};
{a, b};
{a, b} -> {a, b};
{a, b} -> {a, b};
}
To sum up, I am curious why there are non-equal hash values for sets and two identical vertices are created in the graph, despite the fact these sets are equal by all other means.
Upvotes: 3
Views: 119
Reputation: 835
As Jeffrey pointed out, it seems that the problem is in the definition of the hash function that is part of Node
module.
Changing it to let hash x = Hashtbl.hash (SSet.elements x)
fixed the issue.
Upvotes: 1
Reputation: 66823
I suspect the general answer is that OCaml's built-in hashing is based on rather physical properties of a value, while set equality is a more abstract notion. If you represent sets as ordered binary trees, there are many trees that represent the same set (as is well known). These will be equal as sets but might very well hash to different values.
If you want hashing to work for sets, you might have to supply your own function.
Upvotes: 4