Marcel Mendes Reis
Marcel Mendes Reis

Reputation: 107

FeatureTools GroupBy issue excluding entities

This question is a follow-up of this post:

I could solve the first part of the doubt but after that, another arose.

I have the following Featuretools entity set: enter image description here

And I would like to get the groupby_trans_primitives: Diff and TimeSincePrevious(days), but just in the recordings entity, excluding other entities: 'vendedores','produtos',cliente','produto_cliente'

I tried the following code to exclude those entities unsuccessfully:

from featuretools.primitives import TimeSincePrevious
time_since_previous = TimeSincePrevious(unit = "days")

fm, features = ft.dfs(entityset=es, 
                      target_entity='recordings',
                      trans_primitives = [],
                      agg_primitives = [],
                      max_depth=2,
                      verbose=True,
                      groupby_trans_primitives=['Diff',time_since_previous],
                      primitive_options={'time_since_previous': {'ignore_groupby_entities': ['vendedores','produtos','cliente']}})

Because the code returned de following features:

Built 38 features
Elapsed: 00:38 | Progress: 100%|██████████
[<Feature: CODIGO_CLIENTE>,
 <Feature: NOME_VENDEDOR>,
 <Feature: CODIGO_PRODUTO>,
 <Feature: QUANTIDADE>,
 <Feature: VALOR_TOTAL>,
 <Feature: PRODUTO_CLIENTE>,
 <Feature: DIFF(QUANTIDADE) by PRODUTO_CLIENTE>,
 <Feature: DIFF(QUANTIDADE) by CODIGO_PRODUTO>,
 <Feature: DIFF(QUANTIDADE) by NOME_VENDEDOR>,
 <Feature: DIFF(QUANTIDADE) by CODIGO_CLIENTE>,
 <Feature: DIFF(VALOR_TOTAL) by PRODUTO_CLIENTE>,
 <Feature: DIFF(VALOR_TOTAL) by CODIGO_PRODUTO>,
 <Feature: DIFF(VALOR_TOTAL) by NOME_VENDEDOR>,
 <Feature: DIFF(VALOR_TOTAL) by CODIGO_CLIENTE>,
 <Feature: TIME_SINCE_PREVIOUS(DATA_NOTA, unit=days) by PRODUTO_CLIENTE>,
 <Feature: TIME_SINCE_PREVIOUS(DATA_NOTA, unit=days) by CODIGO_PRODUTO>,
 <Feature: TIME_SINCE_PREVIOUS(DATA_NOTA, unit=days) by NOME_VENDEDOR>,
 <Feature: TIME_SINCE_PREVIOUS(DATA_NOTA, unit=days) by CODIGO_CLIENTE>,
 <Feature: cliente.CLASSIFICACAO>,
 <Feature: cliente.REDE>,
 <Feature: cliente.CIDADE>,
 <Feature: cliente.UF>,
 <Feature: TIME_SINCE_PREVIOUS(vendedores.first_recordings_time, unit=days) by PRODUTO_CLIENTE>,
 <Feature: TIME_SINCE_PREVIOUS(vendedores.first_recordings_time, unit=days) by CODIGO_PRODUTO>,
 <Feature: TIME_SINCE_PREVIOUS(vendedores.first_recordings_time, unit=days) by NOME_VENDEDOR>,
 <Feature: TIME_SINCE_PREVIOUS(vendedores.first_recordings_time, unit=days) by CODIGO_CLIENTE>,
 <Feature: TIME_SINCE_PREVIOUS(produto_cliente.first_recordings_time, unit=days) by PRODUTO_CLIENTE>,
 <Feature: TIME_SINCE_PREVIOUS(produto_cliente.first_recordings_time, unit=days) by CODIGO_PRODUTO>,
 <Feature: TIME_SINCE_PREVIOUS(produto_cliente.first_recordings_time, unit=days) by NOME_VENDEDOR>,
 <Feature: TIME_SINCE_PREVIOUS(produto_cliente.first_recordings_time, unit=days) by CODIGO_CLIENTE>,
 <Feature: TIME_SINCE_PREVIOUS(cliente.first_recordings_time, unit=days) by PRODUTO_CLIENTE>,
 <Feature: TIME_SINCE_PREVIOUS(cliente.first_recordings_time, unit=days) by CODIGO_PRODUTO>,
 <Feature: TIME_SINCE_PREVIOUS(cliente.first_recordings_time, unit=days) by NOME_VENDEDOR>,
 <Feature: TIME_SINCE_PREVIOUS(cliente.first_recordings_time, unit=days) by CODIGO_CLIENTE>,
 <Feature: TIME_SINCE_PREVIOUS(produtos.first_recordings_time, unit=days) by PRODUTO_CLIENTE>,
 <Feature: TIME_SINCE_PREVIOUS(produtos.first_recordings_time, unit=days) by CODIGO_PRODUTO>,
 <Feature: TIME_SINCE_PREVIOUS(produtos.first_recordings_time, unit=days) by NOME_VENDEDOR>,
 <Feature: TIME_SINCE_PREVIOUS(produtos.first_recordings_time, unit=days) by CODIGO_CLIENTE>]

And I don't know why the following features were created as my code specified to exclude those entities:

<Feature: TIME_SINCE_PREVIOUS(vendedores.first_recordings_time, unit=days) by PRODUTO_CLIENTE>,
 <Feature: TIME_SINCE_PREVIOUS(vendedores.first_recordings_time, unit=days) by CODIGO_PRODUTO>,
 <Feature: TIME_SINCE_PREVIOUS(vendedores.first_recordings_time, unit=days) by NOME_VENDEDOR>,
 <Feature: TIME_SINCE_PREVIOUS(vendedores.first_recordings_time, unit=days) by CODIGO_CLIENTE>,
 <Feature: TIME_SINCE_PREVIOUS(produto_cliente.first_recordings_time, unit=days) by PRODUTO_CLIENTE>,
 <Feature: TIME_SINCE_PREVIOUS(produto_cliente.first_recordings_time, unit=days) by CODIGO_PRODUTO>,
 <Feature: TIME_SINCE_PREVIOUS(produto_cliente.first_recordings_time, unit=days) by NOME_VENDEDOR>,
 <Feature: TIME_SINCE_PREVIOUS(produto_cliente.first_recordings_time, unit=days) by CODIGO_CLIENTE>,
 <Feature: TIME_SINCE_PREVIOUS(cliente.first_recordings_time, unit=days) by PRODUTO_CLIENTE>,
 <Feature: TIME_SINCE_PREVIOUS(cliente.first_recordings_time, unit=days) by CODIGO_PRODUTO>,
 <Feature: TIME_SINCE_PREVIOUS(cliente.first_recordings_time, unit=days) by NOME_VENDEDOR>,
 <Feature: TIME_SINCE_PREVIOUS(cliente.first_recordings_time, unit=days) by CODIGO_CLIENTE>,
 <Feature: TIME_SINCE_PREVIOUS(produtos.first_recordings_time, unit=days) by PRODUTO_CLIENTE>,
 <Feature: TIME_SINCE_PREVIOUS(produtos.first_recordings_time, unit=days) by CODIGO_PRODUTO>,
 <Feature: TIME_SINCE_PREVIOUS(produtos.first_recordings_time, unit=days) by NOME_VENDEDOR>,
 <Feature: TIME_SINCE_PREVIOUS(produtos.first_recordings_time, unit=days) by CODIGO_CLIENTE>]

I would appreciate any kind of help! Thank you!

Upvotes: 0

Views: 165

Answers (1)

Roy Wedge
Roy Wedge

Reputation: 236

There's actually a bug with how primitive_options ignores entities that will get fixed in the next release of featuretools, but for now you can filter out those features by keeping those primitive options and adding a drop_contains filter

fm, features = ft.dfs(entityset=es, 
                      target_entity='recordings',
                      trans_primitives = [],
                      agg_primitives = [],
                      max_depth=2,
                      verbose=True,
                      groupby_trans_primitives=['Diff',time_since_previous],
                      drop_contains=["TIME_SINCE_PREVIOUS(produtos.", "TIME_SINCE_PREVIOUS(cliente.", "TIME_SINCE_PREVIOUS(produto_cliente.", "TIME_SINCE_PREVIOUS(vendedores."],
                      primitive_options={'time_since_previous': {'ignore_groupby_entities': ['vendedores','produtos','cliente']}})

Once the next release is out we will update the answer to work without drop_contains

Upvotes: 3

Related Questions