Reputation: 532
I'm searching for adding the interaction term in Lasso / LassoCV of scikit-learn. If it is the interaction between two continuous variables or between two categorical variable, I can add the columns corresponding to the multiplication of each element in the interaction. But when we have the interaction between a categorical variable and a continuous variable, I can not multiply them.
Upvotes: 1
Views: 1814
Reputation: 61967
You can absolutely take the interaction between a categorical variable and a continuous variable. But you must turn your categorical variable into a numeric. There are a few ways to do this but making a binary column for each unique category is a common way to do this. Once you create the new matrix, you can send that to your fit method in sklearn. See my very minimal example below
# create data with categorical and continuous variables
import pandas as pd
df = pd.DataFrame({'cat':['a','b','c'], 'cont':[4,1,10]})
Output
cat cont
0 a 4
1 b 1
2 c 10
Use pandas function get_dummies
to create binary variables
df_new = pd.get_dummies(df)
Output of transformed data
cont cat_a cat_b cat_c
0 4 1 0 0
1 1 0 1 0
2 10 0 0 1
Now you can do simple operations
df['a_new'] = df['cont'] * df['cat_a']
Upvotes: 2