Reputation: 1065
My problem is much like the previous post Optimal HashSet Initialization (Scala | Java), where I want to use HashSet
for speedups (currently i am using Set
) but the HashSet
does not exhibit its (Constant time)advantages.
For the solution mentioned:
You can minimize the cost of equals by interning. This means that you acquire new objects of the class through a factory method, which checks whether the requested new object already exists, and if so, returns a reference to the existing object. If you assert that every object of this type is constructed in this way you know that there is only one instance of each distinct object and equals becomes equivalent to object identity, which is a cheap reference comparison (eq in Scala).
However, I am not quite sure what's the efficient way to check
whether the requested new object already exists
for large objects (e.g. objects of case class with parameter of hashmap, some other object structures...etc)
By comparing each of those complicated fields do not give out much performance advantage, isn't it? Or if it is, are there other ways?
In addition, I'm also confused that how to make
equals becomes equivalent to object identity, which is a cheap reference comparison (eq in Scala).
in code.
The intening technique mentioned above, I think, is basically an object cache. Therefore, I reference to the technique mentioned in the post Caching strategy for small immutable objects in Java?. However, I still do not see what's the efficient way for large objects.
For convenience, I quoted the caching technique (in Java) from the post with ///
denoting my thoughts and questions:
private static final int N_POINTS = 10191;
private static final Point[] POINTS = new Point[N_POINTS];
public static Point of(int x, int y, int z) {
int h = hash(x,y,z); /// I can use hash code of each complicated field to construct the value
int index = (h & 0x7fffffff) % N_POINTS;
Point p = POINTS[index];
if (p != null && p.x == x && p.y == y && p.z == z) /// Not working for large objects?
return p;
return POINTS[index] = new Point(x,y,z);
}
To summarize, what's the best practice to implement efficient caching strategy for large objects, so that I can take advantage of HashSet
in Scala?
Thanks,
Upvotes: 3
Views: 1049
Reputation: 8463
The goal of interning is to enable the equals
method to be implemented using reference equality as:
this eq that
(or this == that
in Java).
It's clear that this implementation will have optimal run-time characteristics over
the more traditional equals
that compares some set of fields.
This comparison is only effective if there is one and only one instance of each "unique object" as determined by some set of fields of the object.
Intern-ing is effective only in so far as
the up-front cost of the intern operation
can be fully offset by the minimized cost of the (possibly many) calls to equals
,
driven by HashMap
.
As you've noted, this intern-ing may require a potentially costly caching mechanism: there is a run time overhead (performing the check) and a memory overhead (the size of the cache).
The most straight forward way to cache is with HashMap
, and a traditional equals
. hashCode
should be lazy; caching it's result so it doesn't need to be recomputed.
Threading issues may need to be considered.
One way to implement such a cache is to use a trie, perhaps implemented with a hash table at each node, and where each "level" corresponds to a field in the object (first level - field 1, second level, field 2, etc...) for the "set of fields used to establish uniqueness."
There are other viable ways to implement such a cache. Forgive me for avoiding any further discussion of such, and allow me instead to present ways to avoid dealing with the issue.
Claim: You're likely to have sufficiently effective results by
using a fast hash (and caching it internally),
a "traditional" implementation of equals
,
and by starting with a HashMap
or HashSet
of sufficient minimal size
Ideally there are few collisions in the hash table,
and the number of calls to equals
is minimal.
String
[This method assumes that the "fields that uniquely define the object" are immutable. Suitable adjustments can be made to compensate if this is not true.]
Build & cache a private unique: String
that corresponds to the unique object instance.
For example, the following may be unique for some simple objects:
The concatenation of the string values of the "set of fields used to establish uniqueness," separated by commas.
An understanding of your object/field characteristics will help determine how such a unique string can be created.
With such a value, we can avoid the separate intern/caching mechanism
and retain much of the benefits by implementing both equals
and hashCode
in terms of this unique
string:
def equals(thatObj: Any) = thatObj match {
case that : MyType => unique.equals(that.unique)
case _ => false
}
def hashCode() = unique.hashCode
[EDIT: Rüdiger Klaehn provided this link which offers compelling evidence to avoid String.intern()]
Use String.intern
and adjust the equals
to take advantage of it:
private val unique = buildUnique().intern
def equals(thatObj: Any) = thatObj match {
case that : MyType => unique.eq(that.unique) // In Java: unique == that.unique
case _ => false
}
Upvotes: 4