Georg Heiler
Georg Heiler

Reputation: 17676

Featuretools categorical handling

Featuretools offers integrated functionality to handle categorical variables

variable_types={"product_id": ft.variable_types.Categorical} https://docs.featuretools.com/loading_data/using_entitysets.html

However should these be strings or pandas.Category types for optimal compatibility with Featuretools?

edit

Also, is it required to manually specify all columns like in https://github.com/Featuretools/predict-appointment-noshow/blob/master/Tutorial.ipynb or will they be inferred automatically from fitting pandas datatypes

import featuretools.variable_types as vtypes
variable_types = {'gender': vtypes.Categorical,
                  'patient_id': vtypes.Categorical,
                  'age': vtypes.Ordinal,
                  'scholarship': vtypes.Boolean,
                  'hypertension': vtypes.Boolean,
                  'diabetes': vtypes.Boolean,
                  'alcoholism': vtypes.Boolean,
                  'handicap': vtypes.Boolean,
                  'no_show': vtypes.Boolean,
                  'sms_received': vtypes.Boolean}

Upvotes: 4

Views: 1246

Answers (1)

Max Kanter
Max Kanter

Reputation: 2014

You should use Pandas Category dtype when loading your data into Featuretools. This will save you significantly on memory usage compared to using strings.

You are not required to manually specify each variable type when loading your data. Featuretools will attempt to infer it from the Pandas dtype if not provided.

Upvotes: 3

Related Questions