How to handle one categorical variable in multiple columns using tidymodels?

Question

I have a dataset with one categorical variable spread across multiple columns. Like this,

ID	Pet_1	Pet_2	Pet_3	Siblings	Income	Result
1	dog	horse	cat	0	90000	0
2	cat	bird	NA	1	50000	1
3	NA	NA	NA	3	75000	1
4	horse	dog	snake	1	120000	0

There's an ID column, a set of columns that are really one variable (Pet_1 - Pet_3) where order doesn't matter and can be missing, other predictor columns, and the response.

How can I handle the set of columns that go together using tidymodels? For example, dog in Pet_1 should have the same effect as dog in Pet_3. I was thinking about trying to pull those columns out, pivot long, run an encoding step, aggregate that result back to one row per ID. But I don't think it's possible to aggregate in a recipe step.

How to handle one categorical variable in multiple columns using tidymodels?

Answers (1)

Related Questions