JRogerC
JRogerC

Reputation: 678

Lazy, but persisted, evaluation of java 8 lambda

I am currently working on creating my own persistent array in java, which uses a binary search tree to store an collection of values.

I want to add a map method that takes a Function as an argument to generate a new array. I don't want to evaluate the functions unless the particular value is requested. This is pretty simple to do as lambdas are lazy evaluated. However, I also only want a function to be evaluated only once, even if the result is requested multiple times.

I could create a node that stores the supplier and updates the result when evaluated:

class Node<T> {

    private T value;
    private Supplier<T> supplier;

    public T get() {
        if (null != value)
            return value;
        value = supplier.get();
        return value;
    }
}

...where supplier is derived from a Function being applied to a value in an older version of the persisted array.

However, this is no longer a functional approach, and has the potential to cause errors in a multi-threaded system*. It also doesn't yield an advantage in the case where supplier returns a null value**.

Another approach would be return an instance of Node on a get call:

class Node<T> {

    private final Optional<T> value;
    private final Supplier<T> supplier;

    Node(Supplier<T> supplier, T value) {
        this.supplier = supplier;
        this.value = Optional.ofNullable(value);
    }

    public Tuple<Node<T>, T> get() {
        if (null != value)
            return new Tuple<>(this, value.orElse(null));
        T result = supplier.get();
        Node<T> newNode = new Node<>(null, result);
        return new Tuple<>(newNode, result);
    }
}

I like this approach for staying functional; but it would require a lot of overhead in all the parent nodes going up the tree for just a get call. And it would require cumbersome unboxing in the using application's code.

Does anyone have another approach they can think of to make this work the way I'm asking? Thanks.

*This could be solved using locking mechanisms, but adds a layer of complexity I'm hoping to avoid.

**I've thought about making value an Optional<T>, where null means hasn't been evaluate, and Optional.empty() as has been evaluated and returns a null. However, this works around my issue, rather than solving it.

For anyone not familiar with a persisted array, it's a data structure that creates a new instance of itself whenever an update is performed. This allows it to be purely immutable. Using a binary tree (or more common 32-bit tree) method allows the updates to reduce duplicating data, both in speed and memory.

EDIT:

The code for the collection can be found on github. For description of use, you can look in the test folder.

Upvotes: 5

Views: 2119

Answers (3)

Lavish Kothari
Lavish Kothari

Reputation: 2331

Disclamer: this is not a direct solution to what you are asking, but this answer presents a solution that's not mentioned in other answers and is worth trying.

You can use Suppliers#memoize from Google-guava library.

It will save you from the problem that you are facing when your Supplier returns null and it's also thread-safe.

Also note that the Supplier's memoize method returns com.google.base.Supplier which extends java.util.Supplier so you can use it to assign it to java.util.Supplier so that you don't force your clients (who will use your library) to depend on Guava library.

Upvotes: 1

fps
fps

Reputation: 34470

Disclaimer: this answer doesn't respond the question directly, as it doesn't use neither Supplier nor Optional directly in the Node class. Instead, a generic functional programming technique is presented that might help solve the problem.


If the problem is all about evaluating the function only once for each input value, then you shouldn't change your tree/array/nodes. Instead, memoize the function, which is a pure functional approach:

In computing, memoization or memoisation is an optimization technique used primarily to speed up computer programs by storing the results of expensive function calls and returning the cached result when the same inputs occur again

Here's a way to do it, inspired by this excellent article written by Pierre-Yves Saumont (please check it for an in-depth introduction to memoization):

public static <T, U> Function<T, U> memoize(Function<T, U> function) {
    Map<T, U> cache = new ConcurrentHashMap<>();
    return input -> cache.computeIfAbsent(input, function::apply);
}

Suppose you have a method that takes quite long to execute. Then, you could use the memoize method this way:

// This method takes quite long to execute
Integer longCalculation(Integer x) {
    try {
        Thread.sleep(1_000);
    } catch (InterruptedException ignored) {
    }
    return x * 2;
}

// Our function is a method reference to the method above
Function<Integer, Integer> function = this::longCalculation;

// Now we memoize the function declared above
Function<Integer, Integer> memoized = memoize(function);

Now, if you call:

int result1 = function.apply(1);
int result2 = function.apply(2);
int result3 = function.apply(3);
int result4 = function.apply(2);
int result5 = function.apply(1);

You'll notice that the five calls take ~5 seconds altogether (1 second for each call).

However, if you use the memoized function with the same input values 1 2 3 2 1:

int memoizedResult1 = memoized.apply(1);
int memoizedResult2 = memoized.apply(2);
int memoizedResult3 = memoized.apply(3);
int memoizedResult4 = memoized.apply(2); // <-- returned from cache
int memoizedResult5 = memoized.apply(1); // <-- returned from cache

You'll notice that now the five calls take ~3 seconds altogether. This is because the last two results are immediately returned from the cache.


So, back to your structure... Inside your map method, you could just memoize the given function and use the returned memoized function instead. Internally, this will cache the function's return values in the ConcurrentHashMap.

As the memoize method uses a ConcurrentHashMap internally, you don't need to worry about concurrency.

Note: This is just the beginning... I'm thinking about two possible improvements here. One would be to limit the size of the cache, so that it doesn't take the whole memory if the domain of the given function is too big. The other improvement would be to memoize the given function only if it hasn't been memoized previously. But these details are left as an exercise for the reader... ;)

Upvotes: 4

holi-java
holi-java

Reputation: 30696

I also only want a function to be evaluated only once, even if the result is requested multiple times.

How about this?

class Node<T> {
    private Supplier<T> supplier;

    Node(T value, Supplier<T> supplier) {
        this.supplier = sync(lazy(value, supplier));
    }

    public T get() {
        return supplier.get();
    }
}

the sync method only synchronizing the Supplier once, when the target is called, the lock is disabled for the next continuous requests:

static <T> Supplier<T> sync(Supplier<T> target) {
    return sync(new ReentrantLock(), target);
}

static <T> Supplier<T> sync(ReentrantLock lock, Supplier<T> target) {
    //     v--- synchronizing for multi-threads once
    return lazy(() -> {
        // the interesting thing is that when more than one threads come in here
        // but target.delegate is only changed once
        lock.lock();
        try {  
            return target.get();
        } finally {
            lock.unlock();
        }
    });
}

the lazy method calls a given Supplier only once as below:

static <T> Supplier<T> lazy(T value, Supplier<T> defaults) {
    return lazy(() -> value != null ? value : defaults.get());
}

static <T> Supplier<T> lazy(Supplier<T> target) {
    return new Supplier<T>() {
        private volatile Supplier<T> delegate = () -> {
            T it = target.get();
            //v--- return the evaluated value in turn
            delegate = () -> it;
            return it;
        };

        @Override
        public T get() {
            return delegate.get();
        }
    };
}

IF you interested in how the final code was made, I have commit the code into github, you can just copy and use it. and you can found I have renamed lazy methods to once, which is more expressiveness.

Upvotes: 2

Related Questions