victoria55
victoria55

Reputation: 245

How to cast a Dask Series as string type if it contains an unhashable type?

I would like to call .value_counts() on an arbitrary Dask Series, and I want to cast the Series as type string if it contains an unhashable type. I don't want to cast the series to a string if I don't have to. I also don't want to call .compute() before calling .value_counts(). I have tried

df = pd.DataFrame({"a":[[1], ["foo"], ["foo", "bar"]]})
df = dd.from_pandas(df, npartitions=1)
srs = df["a"]

try:
    val_counts = srs.value_counts()
except TypeError:
    srs = srs.astype(str)
    val_counts = srs.value_counts()

val_counts.compute()

which gives the error

TypeError: unhashable type: 'list'

And

df = pd.DataFrame({"a":[[1], ["foo"], ["foo", "bar"]]})
df = dd.from_pandas(df, npartitions=1)
srs = df["a"]

def func(srs):
    try:
        val_counts = srs.value_counts()
    except TypeError:
        srs = srs.astype(str)
        val_counts = srs.value_counts()
    return val_counts

val_counts = dask.compute(func(srs))

which gives the same error.

I have also tried

df = pd.DataFrame({"a":[[1], ["foo"], ["foo", "bar"]]})
df = dd.from_pandas(df, npartitions=1)
srs = df["a"]

if srs.apply(lambda y: isinstance(y, list), meta=srs).any():
    srs = srs.astype(str)

srs.value_counts().compute()

which gives the error

TypeError: Trying to convert dd.Scalar<series-..., type=str> to a boolean value.

Upvotes: 1

Views: 687

Answers (1)

MRocklin
MRocklin

Reputation: 57319

Maybe convert lists into something hashable like a tuple first?

s.apply(tuple).value_counts()  ? 

Upvotes: 2

Related Questions