Leo Torres
Leo Torres

Reputation: 690

Use FeatureTools on specific columns only

I am trying to use feature tools to generate some new features using only some specified columns for the Titanic dataset. In my case I want to do a transform 'add_numeric' and 'multiply_numeric' on Age, Pclass and log10splitfare. I have followed the syntax given here to the best of my knowledge but no avail. The code below does not error out but it does not produce any additional columns. I also used this stackoverflow link as a reference.

es = ft.EntitySet(id = 'Titanic')
es.entity_from_dataframe(entity_id = 'data', dataframe = ftdataset_cleaned, 
                         make_index = False, index = 'index')

# Run deep feature synthesis with transformation primitives
feature_matrix, feature_defs = ft.dfs(entityset = es, target_entity = 'data',
                                      trans_primitives = ['add_numeric', 'multiply_numeric'],
                                      primitive_options= {('add_numeric', 'multiply_numeric'):{"include_entities": ['Age','PClass','log10SplitFare']}}
                                      )

Upvotes: 1

Views: 468

Answers (1)

Frances Hartwell
Frances Hartwell

Reputation: 191

You can use the include_variables option to specify which columns in an entity to use for specific primitives

feature_matrix, feature_defs = ft.dfs(
    entityset=es,
    target_entity='data',
    trans_primitives=['add_numeric', 'multiply_numeric'],
    primitive_options={
        ('add_numeric', 'multiply_numeric'): {
            'include_variables': {'data': ['Age', 'PClass', 'log10SplitFare']}}})

This guide goes a little more in depth about the different ways you can control how primitives are applied.

Upvotes: 6

Related Questions