Reputation: 10026

Why are sets in many programming languages not really sets?

In several programming languages there are Set collections which are supposed to be implementations of the mathematical concept of a finite set.

However, this is not necessarily true, for example in C# and Java, both implementations of HashSet<T> allow you to add any HashSet<T> collection as a member of itself. Which by the modern definition of a mathematical set is not allowed.

Background:

According to naive set theory, the definition of a set is:

A set is a collection of distinct objects.

However, this definition lead famously to Russel's Paradox as well as other paradoxes. For convenience, Russel's Paradox is:

Let R be the set of all sets that are not members of themselves. If R is not a member of itself, then its definition dictates that it must contain itself, and if it contains itself, then it contradicts its own definition as the set of all sets that are not members of themselves.

So according to modern set theory (See: ZFC), the definition of a set is:

A set is a collection of distinct objects, none of which is the set itself.

Specifically, this is a result of the axiom of regularity.

Well so what? What are the implications of this? Why is this question on StackOverflow?

One of the implications of Russel's Paradox is that not all collections are sets. Additionally, this was the point where mathematicians dropped the definition of a set as being the usual English definition. So I believe this question has a lot of weight when it comes to programming language design in general.

Question(s):

So why would programming languages, who in some form, use these principles in their very design just ignore it in the implementation of the Set in their languages libraries?

Secondly, is this a common occurrence with other implementations of mathematical concepts?

Perhaps I'm being a bit nit picky, but if these are to be true implementations of Sets, then why would part of the very definition be ignored?

Update

Addition of C# and Java code snippets exemplifying behavior:

Java Snippet:

Set<Object> hashSet = new HashSet<Object>();
hashSet.add(1);
hashSet.add("Tiger");
hashSet.add(hashSet);
hashSet.add('f');

Object[] array = hashSet.toArray();
HashSet<Object> hash = (HashSet<Object>)array[3];

System.out.println("HashSet in HashSet:");       
for (Object obj : hash)
    System.out.println(obj);

System.out.println("\nPrinciple HashSet:");
for (Object obj : hashSet)
    System.out.println(obj);

Which prints out:

HashSet in HashSet:
f
1
Tiger
[f, 1, Tiger, (this Collection)]

Principle HashSet:
f
1
Tiger
[f, 1, Tiger, (this Collection)]

C# Snippet:

HashSet<object> hashSet = new HashSet<object>();
hashSet.Add(1);
hashSet.Add("Tiger");
hashSet.Add(hashSet);
hashSet.Add('f');

object[] array = hashSet.ToArray();
var hash = (HashSet<object>)array[2];

Console.WriteLine("HashSet in HashSet:");
foreach (object obj in hash)
     Console.WriteLine(obj);

Console.WriteLine("\nPrinciple HashSet:");
foreach (object obj in hashSet)
     Console.WriteLine(obj);

Which prints out:

HashSet in HashSet:
1
Tiger
System.Collections.Generic.HashSet`1[System.Object]
f

Principle HashSet:
1
Tiger
System.Collections.Generic.HashSet`1[System.Object]
f

Update 2

In regards to Martijn Courteaux's second point which was that it could be done in the name of computational efficiency:

I made two test collections in C#. They were identical, except in the Add method of one of the them - I added the following check: if (this != obj) where obj is the item being added to the collection.

I clocked both of them separately where they were to add 100,000 random integers:

With Check: ~ 28 milliseconds

Without Check: ~ 21 milliseconds

This is a fairly significant performance jump.

Upvotes: 6

Answers (4)

Alexey Romanov

Reputation: 170745

Programming language sets really aren't like ZFC sets, but for quite different reasons than you suppose:

You can't form a set by comprehension (i.e. set of all objects such that ...). Note that this already blocks all (I believe) naive set theory paradoxes, so they are irrelevant.
They usually can't be infinite.
There exist objects which aren't sets (in ZFC there are only sets).
They are usually mutable (i.e. you can add/remove elements to/from the set).
The objects they contain can be mutable.

So the answer to

So why would programming languages, who in some form, use these principles in their very design just ignore it in the implementation of the Set in their languages libraries?

is that the languages don't use these principles.

Upvotes: 10

Martijn Courteaux

Reputation: 68857

Well, I think that this is because of some reasons:

For very specific programming purposes, you might want to create a set that contains itself. When you are doing this, you are not really caring about what a set mathematically means and you just want to enjoy the functionality a Set offers: "Adding" elements to the set without creating duplicate entries. (I have to be honest I can't think of a situation where you want to do that.)
For performance purposes. The chance you want to use a Set and let it contain itself is very very rare. So, it would be a waste of computing power, energy, time, performance and environmental health to check each time you try to add an element if it would be the set itself.

Upvotes: 4

Pawel

Reputation: 1467

In java the set is mathematical set. You can insert objects, but only one can be in the set. Information from javadoc about set:

"A collection that contains no duplicate elements. More formally, sets contain no pair of elements e1 and e2 such that e1.equals(e2), and at most one null element. As implied by its name, this interface models the mathematical set abstraction."

Source: http://docs.oracle.com/javase/7/docs/api/

Upvotes: 0

assylias

Reputation: 328649

I can't speak for C#, but as far as Java is concerned a set is a set. If you look at the javadoc for the Set interface, you will see (emphasis mine):

Note: Great care must be exercised if mutable objects are used as set elements. The behavior of a set is not specified if the value of an object is changed in a manner that affects equals comparisons while the object is an element in the set. A special case of this prohibition is that it is not permissible for a set to contain itself as an element.

Whether the prohibition is actively enforced is unclear (adding a HashSet to itself does not throw any exceptions for example), but at least it is clearly documented that you should not try because the behaviour would then be unspecified.

Upvotes: 4

Why are sets in many programming languages not really sets?

Answers (4)

Related Questions