Reputation: 1420
I have the following dask dataframe
:-
import dask.dataframe as dd
import pandas as pd
df = pd.DataFrame({"A": [1, 2, 1, 2, 3, 1, 2, 3, 5], "B": ["a", "b", "c", "c", "a", "b", "b", "a", "c"], "y":[0, 10, 2, 1, 4, 1, 6, 12, 11]})
X = dd.from_pandas(df, npartitions=2)
In the dataframe
X
, column B
has the categories that I want encode, and column y
are the y
values. This is just an example dataframe
. In reality, my dataset has more than 1000 categories.
How can I do leave-one-out encoding on a dask
dataframe
?
Thanks in advance.
Upvotes: 1
Views: 77