GitCrush
GitCrush

Reputation: 113

Erlang - String into Set

I am currently trying to understand the behavior of Erlang Sets, when I compute Sets on String Anagrams. In my understanding, two Anagrams should produce two identical Sets of strings.

Set1 = sets:from_list("orchestra"). 
Set2 = sets:from_list("carthorse"). 
Set1 =:= Set2. %% true

However, using sets:intersection we receive a different set, which is not equal to the first two sets.

Intersection = sets:intersection(Set1, Set2). 
Intersection =:= Set1. %% false
Intersection =:= Set2. %% false

Is there a particular reason for this behavior, based on how Set-Intersections are computed in Erlang? Many thanks in advance!

Upvotes: 0

Views: 212

Answers (3)

Nalin Ranjan
Nalin Ranjan

Reputation: 1782

Let's take the code and look at what happens along the way.

We are defining 2 sets

Set1 = sets:from_list("orchestra"). 
{set,8,16,16,8,80,48,
     {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},
     {{[],"sc",[],[],[],"o","r","e","h",[],[],"a","t",[],[],[]}}}

and

Set2 = sets:from_list("carthorse"). 
{set,8,16,16,8,80,48,
     {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},
     {{[],"sc",[],[],[],"o","r","e","h",[],[],"a","t",[],[],[]}}}

where the second lines represent the values created with a representation. So based on the above definition, we infer that the representation chosen for storing sets is of a tuple.

We can refer to the Term Comparison section of Erlang Reference manual which states that

...Lists are compared element by element. Tuples are ordered by size, two tuples with the same size are compared element by element...

Having that as an invariant, now lets look at the set which comes about after an intersection

Intersection = sets:intersection(Set1, Set2).
{set,8,16,16,8,80,48,
     {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},
     {{[],"cs",[],[],[],"o","r","e","h",[],[],"a","t",[],[],[]}}}

Again, this being a set gets the same underlying representation of the tuple, but if we look at the last element, there is a string (essentially a list of characters in Erlang) "cs", which is different in value in the same place when we defined Set1 and Set2. And hence the inequality as per the Term Comparison.

Now, let's attempt to change the sub-value "cs" inside Intersection to "sc" and then as per the Term Comparison rules, two sets must satisfy equality.

Improvised_Intersection = setelement(9, Intersection, {setelement(2, element(1, element(9, Intersection)), "sc")}).
{set,8,16,16,8,80,48,
     {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},
     {{[],"sc",[],[],[],"o","r","e","h",[],[],"a","t",[],[],[]}}}

And now let's compare Improvised_Intersection with Set1 and Set2, which gives

Improvised_Intersection =:= Set1.                                                                                  
true

Improvised_Intersection =:= Set2.                                                                                  
true

Intersection =:= Improvised_Intersection.
false

It's just an attempt to provide some insight into what already @Richard and others have pointed out.

Upvotes: 1

RichardC
RichardC

Reputation: 10557

The implementation of the sets module does not guarantee that two sets can be compared with =:= even if they contain the same elements. The internal data structure can differ. You could use operations like is_subset/2 or subtract/2 (relatively inefficient), or you could use to_list/1 and then lists:sort/1 to get two lists that could be compared directly. But if you're starting from strings (lists of characters) anyway, you would be better off using ordsets right away. These are ordered lists which you can manipulate as sets, and can be directly compared. For small-ish sets they are usually more efficient that sets anyway.

> Set1 = ordsets:from_list("orchestra").
"acehorst"
> Set2 = ordsets:from_list("carthorse").
"acehorst"
> Set1 =:= Set2.
true
> Intersection = ordsets:intersection(Set1, Set2).
"acehorst"
> Intersection =:= Set1.
true
> Intersection =:= Set2.
true

Upvotes: 1

choroba
choroba

Reputation: 241868

The =:= operator compares the representations of Set1 and Intersection, but there's no guarantee what the representations are or that the same set only has one representation.

The documentation of sets only talks about =:= when it describes how it compares elements of sets, not the sets themselves.

For set equality, you can define

set_eq(S1, S2) ->
    sets:is_subset(S1, S2) andalso sets:is_subset(S2, S1).

Upvotes: 1

Related Questions