guillaume latour
guillaume latour

Reputation: 390

blosc.MAX_BUFFERSIZE error while trying to guess if a dask dataframe is empty

I want to perform a test on the emptiness of a dask dataframe. So I have this dask dataframe ddf, a local ray cluster, and dask configured to use ray as backend.

I've seen here that there is no empty property and that I have to perform the following code

len(ddf.index) == 0

This results in ValueError: bytesobj cannot be larger than 2147483631 bytes, triggered by the following code (located in blosc.toplevel)

def _check_input_length(input_name, input_len):
    if input_len > blosc.MAX_BUFFERSIZE:
        raise ValueError("%s cannot be larger than %d bytes" %
                         (input_name, blosc.MAX_BUFFERSIZE))

I have tried to get just one element out of the index, which will obviously answer the question that it's not empty, but this causes the same error to be triggered.

a = ddf.index.tail(1)
b = ddf.index.head(1)

Upvotes: 1

Views: 165

Answers (1)

guillaume latour
guillaume latour

Reputation: 390

The issue was that the number of partitions on the dask dataframe was 1.

I've used ddf.repartition(npartitions=32) to solve my issue.

partition_size="100MB" is the recommended way to go.

Upvotes: 0

Related Questions