Reputation: 113
I am currently trying to understand the behavior of Erlang Sets, when I compute Sets on String Anagrams. In my understanding, two Anagrams should produce two identical Sets of strings.
Set1 = sets:from_list("orchestra").
Set2 = sets:from_list("carthorse").
Set1 =:= Set2. %% true
However, using sets:intersection
we receive a different set, which is not equal to the first two sets.
Intersection = sets:intersection(Set1, Set2).
Intersection =:= Set1. %% false
Intersection =:= Set2. %% false
Is there a particular reason for this behavior, based on how Set-Intersections are computed in Erlang? Many thanks in advance!
Upvotes: 0
Views: 212
Reputation: 1782
Let's take the code and look at what happens along the way.
We are defining 2 sets
Set1 = sets:from_list("orchestra").
{set,8,16,16,8,80,48,
{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},
{{[],"sc",[],[],[],"o","r","e","h",[],[],"a","t",[],[],[]}}}
and
Set2 = sets:from_list("carthorse").
{set,8,16,16,8,80,48,
{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},
{{[],"sc",[],[],[],"o","r","e","h",[],[],"a","t",[],[],[]}}}
where the second lines represent the values created with a representation. So based on the above definition, we infer that the representation chosen for storing sets is of a tuple.
We can refer to the Term Comparison section of Erlang Reference manual which states that
...Lists are compared element by element. Tuples are ordered by size, two tuples with the same size are compared element by element...
Having that as an invariant, now lets look at the set which comes about after an intersection
Intersection = sets:intersection(Set1, Set2).
{set,8,16,16,8,80,48,
{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},
{{[],"cs",[],[],[],"o","r","e","h",[],[],"a","t",[],[],[]}}}
Again, this being a set gets the same underlying representation of the tuple, but if we look at the last element, there is a string (essentially a list of characters in Erlang) "cs"
, which is different in value in the same place when we defined Set1
and Set2
. And hence the inequality as per the Term Comparison
.
Now, let's attempt to change the sub-value "cs"
inside Intersection
to "sc"
and then as per the Term Comparison
rules, two sets must satisfy equality.
Improvised_Intersection = setelement(9, Intersection, {setelement(2, element(1, element(9, Intersection)), "sc")}).
{set,8,16,16,8,80,48,
{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},
{{[],"sc",[],[],[],"o","r","e","h",[],[],"a","t",[],[],[]}}}
And now let's compare Improvised_Intersection
with Set1
and Set2
, which gives
Improvised_Intersection =:= Set1.
true
Improvised_Intersection =:= Set2.
true
Intersection =:= Improvised_Intersection.
false
It's just an attempt to provide some insight into what already @Richard and others have pointed out.
Upvotes: 1
Reputation: 10557
The implementation of the sets
module does not guarantee that two sets can be compared with =:=
even if they contain the same elements. The internal data structure can differ. You could use operations like is_subset/2
or subtract/2
(relatively inefficient), or you could use to_list/1
and then lists:sort/1
to get two lists that could be compared directly. But if you're starting from strings (lists of characters) anyway, you would be better off using ordsets
right away. These are ordered lists which you can manipulate as sets, and can be directly compared. For small-ish sets they are usually more efficient that sets
anyway.
> Set1 = ordsets:from_list("orchestra").
"acehorst"
> Set2 = ordsets:from_list("carthorse").
"acehorst"
> Set1 =:= Set2.
true
> Intersection = ordsets:intersection(Set1, Set2).
"acehorst"
> Intersection =:= Set1.
true
> Intersection =:= Set2.
true
Upvotes: 1
Reputation: 241868
The =:=
operator compares the representations of Set1
and Intersection
, but there's no guarantee what the representations are or that the same set only has one representation.
The documentation of sets only talks about =:=
when it describes how it compares elements of sets, not the sets themselves.
For set equality, you can define
set_eq(S1, S2) ->
sets:is_subset(S1, S2) andalso sets:is_subset(S2, S1).
Upvotes: 1