Cheok Yan Cheng
Cheok Yan Cheng

Reputation: 42642

Is there any data structure which can avoid duplication, preserve order and random access

Previously, while I am looking for the data structure with the following characteristic.

In Java, I use LinkHashSet, In Python, I use OrderedDict

Now, on the top of 2 requirements, I would like to have an additional requirement

Is there any data structure available? Or I need to fall back to use List? List able to full fill 2nd and 3rd requirement, but not 1st. I may need to perform manual (and slow check) during insertion, to avoid duplication?

Upvotes: 0

Views: 5740

Answers (5)

Kevin Bourrillion
Kevin Bourrillion

Reputation: 40851

If you can use an immutable collection, use an ImmutableSet from Guava, which has an asList() view to provide indexed access.

Upvotes: 1

Stephen C
Stephen C

Reputation: 718678

A simple approach in Java would be to create a wrapper class that implements both the Set and List interfaces, and that contained both a HashSet and ArrayList. Update operations would need to update both internal collections, and read operations would be mapped to whichever internal collection gives the correct semantics and best performance. The only slightly tricky method is iterator() where you need to arrange that remove updates both collections.

This approach will give you the "best of both worlds" performance for read operations, but updates will necessarily be slower. In particular, insert at a given position and remove will be O(N) operations.

(I'd note that LinkedHashSet is not a direct solution since it does not provide a get(int) method. You could implement this method via the LinkedHashSet iterator, by that makes it an O(N) operation. Probably not what you want.)

FOLLOW UP

I've not been able to find a general purpose implementation class that implements both the Set and List interfaces. I think that the reason is that there are semantic anomalies when you combine the interfaces. For example, (as @ColinD notes) if you call E set(int, E) with an element that it already in the list, it is not clear what the result should be. Dealing with this in a way that satisfies everyone is probably impossible, and I can understand why they might have decided not to go swimming in the tarpit.

However, I don't see this as a major problem if you are creating a Set + List class for your application's internal use. You either

  • pick a semantic for that suits your application,
  • code your application to not use the method at all, or
  • code your application to avoid being bitten by the anomaly.

(For instance, you might code it to either ignore result of the set method, to throw an unchecked exception if there is a duplicate, or return null or some distinguished object if there is a duplicate.)

For the record, it is not unforgivable for a custom collection class to violate the interface contract. Indeed, even the Java designers do it - see IdentityHashMap. What is unforgivable is not documenting the contract violations in the javadocs.

Upvotes: 3

Glenn Maynard
Glenn Maynard

Reputation: 57464

You're not going to find a basic data structure that does this; the goals you're looking for rule out all of them. You might find an more esoteric one that'll do it, but the easiest approach is to use a compound data structure, maintaining two data structures in parallel.

That's what collections.OrderedDict does under the hood, in fact. That's not what you want, though: since it's not designed to support indexing, it uses a linked list under the hood for preserving order. Linked lists can't do indexing--short of a slow, linear scan, which you usually want to avoid since it tends to turn O(n^2) on you if used in a loop.

Here's a simple implementation. It maintains two data structures: a list, preserving the order of items as they're set, and a dict, for quick lookup by key. Both hold the value, and both hold the other's key: the dict holds the index within the list, and the list holds the key within the dict. This makes it easy to reference each data structure from the other, so it can handle both assignment and iteration efficiently.

Note that this doesn't implement every operation, just the basic ones: dict-style assignment a['x'] = 1, dict-style lookup a['x'], list-style assignment a.set_value_by_index(0, 1) and list-style lookup a.get_value_by_index(0).

Also note carefully: this doesn't use the same syntax for dict-style and list-style operations. That's confusing and evil, and will bite you badly sooner or later. This doesn't turn a[0] into a list-style lookup; if that's what you want, be explicit and use get_value_by_index. Don't be magic and try to guess based on the parameter type.

Finally, it provides simple dict-style iteration, yielding the keys as a dict does. Implementing things like iteritems and itervalues or Python3 views are obvious extensions.

class IndexableUniqueList(object):
    """
    >>> a = IndexableUniqueList()
    >>> a['x'] = 1
    >>> a['x']
    1
    >>> a['y'] = 2
    >>> a['y']
    2
    >>> a.get_key_by_index(0)
    'x'
    >>> a.get_value_by_index(0)
    1
    >>> a.get_key_by_index(1)
    'y'
    >>> a.get_value_by_index(1)
    2
    >>> a['x'] = 3
    >>> a.get_key_by_index(0)
    'x'
    >>> a.get_value_by_index(0)
    3
    >>> a.set_value_by_index(0, 4)
    >>> a['x']
    4
    >>> [val for val in a]
    ['x', 'y']
    """
    def __init__(self):
        self.items_by_index = []
        self.items_by_key = {}

    def __getitem__(self, key):
        return self.items_by_key[key][1]

    def __setitem__(self, key, value):
        if key in self.items_by_key:
            idx, old_value = self.items_by_key[key]
            self.items_by_key[key] = (idx, value)
            self.items_by_index[idx] = (key, value)
            return

        idx = len(self.items_by_index)
        self.items_by_key[key] = (idx, value)
        self.items_by_index.append((key, value))
    def get_key_by_index(self, idx):
        return self.items_by_index[idx][0]
    def get_value_by_index(self, idx):
        key = self.get_key_by_index(idx)
        return self.items_by_key[key][1]
    def set_value_by_index(self, idx, value):
        key = self.items_by_index[idx][0]
        self[key] = value
    def __iter__(self):
        for key, value in self.items_by_index:
            yield key

Upvotes: 0

Pascalius
Pascalius

Reputation: 14659

The LinkedHashSet class provides the toArray-Method, which should fit your needs.

Upvotes: 0

Leo Izen
Leo Izen

Reputation: 4290

java.util.Set doesn't provide random access methods like get() and set(), so most/all of its implementations don't either. You could create your own implementation of Set that will provide this, maybe with an ArrayList to hold the data.

Upvotes: 0

Related Questions