javamultithreadingsynchronizationthread-safety

Reputation: 53

Thread safety in huge data model

Background: I have got a (more or less) huge data model in the memory. The model contains around 3.150.000 to 12.600.000 objects that could be modified directly. In addition, there are around 75.000.000 objects that can only be modified via those 3.150.000 to 12.600.000 objects.

On the other hand, there are around 10 modules which accessing the model. These modules can be grouped into:

reading and modifying some of the objects every 250 ms to 1000 ms
reading and modifying some of the objects on demand
reading the some of the objects if they have been changed

Question: How to synchronize such a data model? There are the following ideas in my mind:

(1) A lock in every class that can be directly modified. Advantage: Only the objects that are modified must be locked. Disadvantage: A high synchronization effort and a huge amount of lock instances (3.150.000 to 12.600.000 additional objects/locks). There is a great danger of doing something wrong in the synchronization (deadlocks, etc.).

(2) A central interface to access the whole data model. This interface would lock the whole model on every modification via a single lock. Advantage: Only a single lock --> less synchronization effort. Disadvantage: The whole model is locked regardless of the type of change.

(3) Dispatch Thread (like in AWT/Swing). A thread which processes tasks (events). Advantage / disadvantage like idea (2). However, this would be a event based solutuion. I read Graham Hamilton's article about multi-threading in GUI-tollkits. In addition, there is a great talk about "Events versus Threads" by John Ousterhout. Of course my data model isn't that extensive, but the article gets to the heart of the matter.

Here the link to Graham Hamilton's article: https://weblogs.java.net/blog/kgh/archive/2004/10/multithreaded_t.html

So, what are your experiences? Maybe you have a better idea.

EDIT: I made a big mistake on the object calculation. I just updated the amount.

Thanks in advance :)

EDIT 2: Here a model I just created for demonstration purposes:

enum Ware { WOOD, COAL, STONE }
class Stock { Map<Ware, Integer> internalStock; }
class Coordinate { int x; int y; }
interface ILand {}

class World {
  Map<Coordinate, ILand> land;
  Map<Coordinate, Ship> ships;
}

class Island implements ILand { Stock resources; }
class Ship { Stock stock; }
class Building {Stock stock; }

class Colony implements ILand {
  Island builtOn;
  Set<Building> building;
}

class Character {
  Set<Colony> colonies;
  Set<Ship> fleet;
}

This would be the strucure of the data model:

Model
   World     <>--- ILand
             <>--- Ship
   Character <>--- Colony <>--- Building <>--- Stock
                          <>--- Island   <>---Stock
             <>--- Ship   <>--- Stock

Upvotes: 4

Answers (5)

Andrey Chaschev

Reputation: 16496

a) * This is solution which was used in a real social game * If you can think of a key for your objects or proper equals/hashCode functions, you can put them into ConcurrentHashMap. Each present entity in this map would mean locked state for an object. This will result into 40 bytes per entity overhead.

b) You can optimize previous solution and come up with another hash function which would split all your objects into buckets of some reasonable size, i.e. 100 elements (you could measure the needed amount by running tests). In this case a whole bucket would be locked and it would save you some extra bytes. This would result into ~12 bytes overhead per entity to store elements in buckets (i.e. in an ArrayList).

c) Third option, AtomicBitSet implementation for java. This is a modification of a second approach. Buckets can be locked via a compact atomic set. This would be a little better than the second option, and the advantage of this one is that you can have smaller buckets as they require less memory (~40 bytes per bucket in a ConcurrentHashMap vs a couple of bits per bucket in an AtomicBitSet).

Locks

A state might be more complex than just locked/not locked. So instead of maintaining a map:

 lock map: objectId -> {true | false}

 lock map: bucket of objectIds -> {true | false}

One could store lock information:

 lock map: objectId -> {ReadWriteLock lock, Thread owner, long writeLockGrantedAtMs}

If there is no object in this map, then no one locks it. In other case, the object is locked with a lock strategy described by ReadWriteLock. writeLockedAtMs could be used to interrupt the owner if he's holding it for too long.

ADDED

I'm not sure you need this, but deadlocks can be completely avoided if you implement atomic lock for your entities and re-order them i.e. by a hashCode when locking. This can be done by sequentially applying locks to each of the objects with a timeout. Simplified pseudocode:

void lockObjects (f, e, a) {

    reorder (f, e, a)

    if(!tryLock(a, timeout: 10ms)){
        throw "could not lock a";
    }

    if(!tryLock(e, timeout: 10ms)){
        throw "could not lock e";
    }

    if(!tryLock(f, timeout: 10ms)){
        throw "could not lock f";
    }

    // now these objects are locked, deadlocks avoided
}

UPDATE for the data model

I actually implemented structure a) for 3 social games which were running in production for 1-2 years. The resulting solution was a bit more complex and included persistence, monitoring and deadlock resolving, but this was a requirement and is not very needed.

For example, if you want to add a Colony to a Character, you would make a lock for a character. And you should make sure you always lock your object / there is no other way to get your object than by obtaining a lock.

If you want to add a Colony to six Characters, you could do this non-atomically, i.e. sequentially add Colony to each Character (each addition being atomic) or implement atomic lock and lock all seven objects. The difference can be noticed if there are some problems with locks - in the first case you would get a bigger delay, in the second case you might get a partial update.

Upvotes: 1

Trying

Reputation: 14278

I would say first very important point is to make your classes immutable as much as possible. With immutable thread safety comes free. You can share your object with out synchronization, JMM will take care of safe publication i.e. you can over come the visibility issue without doing synchronization.
As soon as you start thinking of immutable objects think about Flyweight pattern which help you in reducing the memory foot print in object creation and also with this performance also increases. Because as you know your classes are immutable and only one object is present of one type you can cache lot of information like hashCode of an object and also can be calculated lazily.
Instead of using synchronization block you can go for tryLock from Lock interface, which helps you in preventing the deadlock.
You can use ReadWriteLock also. For read use Read Lock and for write use write lock.
You can use global ordering to get rid of deadlock. Suppose you have a method like this:

public void fun(MyClass1 o1, MyClass2 o2)
{
    synchronized(o1)
      {
        synchronized(o2){
            .........
            .........
            .........
      }
}

here there is a very well possibility of deadlock so here you can use global lock ordering like:

public void fun(MyClass1 o1, MyClass2 o2)
{
    long l1 = System.identityHashCode(o1);
    long l2 = System.identityHashCode(o2);
    if(l1>l2){
      synchronized(o1)
      {
        synchronized(o2)
        {
            .........
            .........
            .........
       }
    }
   }
   else if (l1<l2)
   {
      synchronized(o2)
      {
        synchronized(o1)
        {
            .........
            .........
            .........
       }
    }
   }
 else//if equal than resolve by another mutex
 {
     synchronize(o)
     {
          synchronize(o1)
          {
              synchronize(o2)
              {
                  .........
                  .........
                  .........

              }
          }
      }
 }
}

Calculate identityHashCode if long1 > long2 then one type of ordering, if long2 > long1 than another type of ordering and if there is a tie then use a mutex o to resolve. This helps in great deal in avoiding deadlocks.

you can use concurrent data structure like ConcurrentLinkedQueue which uses compareAndSwap machine instruction which are non-blocking and lock free as well.
consider using ConcurrentHashMap which gives better scalability in multithreading environment as it uses lock striping.
While using Lock interface make sure you use non-fair lock as it scales well and gives better performance in real world. Reason: suppose you think a thread A just finishes a job and then the next thread B is ready to be given the CPU. Before it is given the cpu the data should be brought to cache, state should be changed and a lot of things to do. If at that time a thread C comes for cpu and if you have non fair lock than the new thread C will be given the cpu and it increases the overall system performance. Because till the time B has finished waking up it may be possible this new thread C would have finished it's task. It is win for all.

Upvotes: 0

Alexei Kaigorodov

Reputation: 13535

Variants 2 and 3 reduce parallelism to zero (only one thread can access data each moment). Variant 1 maximizes parallelism (indeed one lock per object, not per class). Synchronization efforts are minimal in this case (low contention). The memory for locks is shadowed by existence of 75 millions of background objects. Deadlocks can (and should) be avoided by careful synchronization scheduling (avoid cycles on resource graph).

Upvotes: 0

mikera

Reputation: 106361

You may want to consider making your data model into an immutable persistent data structure.

This approach is used to very good effect in languages like Scala and Clojure. The following video is well worth watching if you want to understand this technique better:

http://www.infoq.com/presentations/Value-Identity-State-Rich-Hickey

This is often a good strategy when you have significant concurrent access: it has various advantages:

Readers don't require any locking. This can be a big performance win in situations where there are many readers.
Update can happen atomically - this means that you never run the risk of readers seeing the data in an inconsistent state.
You can take a "snapshot" of the whole data structure at any time. Since the immutable data structure can't change, you are free to get a reference to it and then examine it at leisure
Updates are still relatively cheap: structural sharing means that you can make a new version of the data model with a few changes without copying the whole data model.
You can have various different update semantics that suit your requirements. In this case, it sounds like you have a "read and update" semantic coupled with some form of change notification.

Upvotes: 2

brettw

Reputation: 11114

Write some unit tests. For the model, start with high-level locks, like synchronized methods. If some of the methods are just modifying maps, but not objects within, consider using ConcurrentHashMaps for those instead of synchronized methods.

Upvotes: 0

Thread safety in huge data model

Answers (5)

Related Questions