Reputation: 17676
Featuretools offers integrated functionality to handle categorical variables
variable_types={"product_id": ft.variable_types.Categorical} https://docs.featuretools.com/loading_data/using_entitysets.html
However should these be strings
or pandas.Category
types for optimal compatibility with Featuretools?
Also, is it required to manually specify all columns like in https://github.com/Featuretools/predict-appointment-noshow/blob/master/Tutorial.ipynb or will they be inferred automatically from fitting pandas datatypes
import featuretools.variable_types as vtypes
variable_types = {'gender': vtypes.Categorical,
'patient_id': vtypes.Categorical,
'age': vtypes.Ordinal,
'scholarship': vtypes.Boolean,
'hypertension': vtypes.Boolean,
'diabetes': vtypes.Boolean,
'alcoholism': vtypes.Boolean,
'handicap': vtypes.Boolean,
'no_show': vtypes.Boolean,
'sms_received': vtypes.Boolean}
Upvotes: 4
Views: 1246
Reputation: 2014
You should use Pandas Category dtype when loading your data into Featuretools. This will save you significantly on memory usage compared to using strings.
You are not required to manually specify each variable type when loading your data. Featuretools will attempt to infer it from the Pandas dtype if not provided.
Upvotes: 3