freebie
freebie

Reputation: 1977

Truth of Delayed objects is not Supported

I'm using dask to delay computation of some functions that return series in my code-base. Most operations seem to behave as expected so far - apart from my use of np.average.

The function I have returns a pd.Series which I then want to compute a weighted average on.

Below is a non-dask and dask version:

import dask
import numpy as np
import pandas as pd

s = pd.Series([1,2,3])
a = np.average(s, weights=s)
print(a)

ds = dask.delayed(lambda: s)()
a = np.average(ds, weights=ds)
print(a.compute())

The np.average call raises a TypeError: Truth of Delayed objects is not supported.

Unsure what part of my usage is wrong here.

Upvotes: 5

Views: 5518

Answers (1)

mdurant
mdurant

Reputation: 28684

The problem is that you are calling a Numpy function np.average on a dask delayed object. The Numpy function has no idea what do to with a Dask Delayed object, so it raises an error. The solution is to delay the numpy function as well.

You can do the following:

a = dask.delayed(np.average)(ds, weights=ds)
a.compute()

This works (you get the answer), but it may well not be what you were after. The single function is being called on the data - you are indeed getting lazy operation and you may get parallelism if you have many such computations. However, I'd say it is pretty unusual to pass around delayed pandas series like this.

You may want to read up on the high level array and data-frame interfaces, where the logic of splitting up series and arrays is done for you.

Upvotes: 3

Related Questions