Reputation: 390
I want to perform a test on the emptiness of a dask dataframe.
So I have this dask dataframe ddf
, a local ray cluster, and dask configured to use ray as backend.
I've seen here that there is no empty
property and that I have to perform the following code
len(ddf.index) == 0
This results in ValueError: bytesobj cannot be larger than 2147483631 bytes
, triggered by the following code (located in blosc.toplevel
)
def _check_input_length(input_name, input_len):
if input_len > blosc.MAX_BUFFERSIZE:
raise ValueError("%s cannot be larger than %d bytes" %
(input_name, blosc.MAX_BUFFERSIZE))
I have tried to get just one element out of the index, which will obviously answer the question that it's not empty, but this causes the same error to be triggered.
a = ddf.index.tail(1)
b = ddf.index.head(1)
Upvotes: 1
Views: 165
Reputation: 390
The issue was that the number of partitions on the dask dataframe was 1.
I've used ddf.repartition(npartitions=32)
to solve my issue.
partition_size="100MB"
is the recommended way to go.
Upvotes: 0