Reputation: 245
I would like to call .value_counts()
on an arbitrary Dask Series, and I want to cast the Series as type string if it contains an unhashable type. I don't want to cast the series to a string if I don't have to. I also don't want to call .compute()
before calling .value_counts()
. I have tried
df = pd.DataFrame({"a":[[1], ["foo"], ["foo", "bar"]]})
df = dd.from_pandas(df, npartitions=1)
srs = df["a"]
try:
val_counts = srs.value_counts()
except TypeError:
srs = srs.astype(str)
val_counts = srs.value_counts()
val_counts.compute()
which gives the error
TypeError: unhashable type: 'list'
And
df = pd.DataFrame({"a":[[1], ["foo"], ["foo", "bar"]]})
df = dd.from_pandas(df, npartitions=1)
srs = df["a"]
def func(srs):
try:
val_counts = srs.value_counts()
except TypeError:
srs = srs.astype(str)
val_counts = srs.value_counts()
return val_counts
val_counts = dask.compute(func(srs))
which gives the same error.
I have also tried
df = pd.DataFrame({"a":[[1], ["foo"], ["foo", "bar"]]})
df = dd.from_pandas(df, npartitions=1)
srs = df["a"]
if srs.apply(lambda y: isinstance(y, list), meta=srs).any():
srs = srs.astype(str)
srs.value_counts().compute()
which gives the error
TypeError: Trying to convert dd.Scalar<series-..., type=str> to a boolean value.
Upvotes: 1
Views: 687
Reputation: 57319
Maybe convert lists into something hashable like a tuple first?
s.apply(tuple).value_counts() ?
Upvotes: 2