How does df.groupby('A').agg('min') translate to featuretools?

Question

Say I have this simple snippet of code. I will group, aggregate, and merge the dataframe:

Using Pandas:

Data

df = pd.DataFrame({'A': [1, 1, 2, 2],
               'B': [1, 2, 3, 4],
               'C': [0.3, 0.2, 1.2, -0.5]})

df:

    A   B   C
0   1   1   0.3
1   1   2   0.2
2   2   3   1.2
3   2   4   -0.5

Group and Aggregate

df_result = df.groupby('A').agg('min')
df_result.columns =  ['groupby_A(min_'+x+')' for x in df_result.columns]

df_result:

    groupby_A(min_B)    groupby_A(min_C)
A       
1   1                   0.2
2   3                   -0.5

Merge

df_new = pd.merge(df,df_result,on='A')
df_new

df_new:

    A   B   C       groupby_A(min_B)    groupby_A(min_C)
0   1   1   0.3     1                   0.2
1   1   2   0.2     1                   0.2
2   2   3   1.2     3                  -0.5
3   2   4   -0.5    3                  -0.5

An Attempt using featuretools:

# ---- Import the Module ----
import featuretools as ft

# ---- Make the Entity Set (the set of all tables) ----
es = ft.EntitySet()

# ---- Make the Entity (the table) ----
es.entity_from_dataframe(entity_id = 'df', 
                         dataframe = df)


# ---- Do the Deep Feature Synthesis (group, aggregate, and merge the features) ----
feature_matrix, feature_names = ft.dfs(entityset = es, 
                                       target_entity = 'df',
                                       trans_primitives = ['cum_min'])

feature_matrix

feature_matrix:

        A       B       C       CUM_MIN(A)  CUM_MIN(B)  CUM_MIN(C)
index                       
0       1       1       0.3     1           1           0.3
1       1       2       0.2     1           1           0.2
2       2       3       1.2     1           1           0.2
3       2       4       -0.5    1           1           -0.5

How does the operation with Pandas translate into featuretools (preferably without adding another table)?

My attempt with featuretools does not give the right output, but I believe the process that I used is somewhat correct.

Max Kanter · Accepted Answer

Here is the recommended way to do it in Featuretools. You do need to create another table to make it work exactly as you want.

import featuretools as ft
import pandas as pd

df = pd.DataFrame({'A': [1, 1, 2, 2],
                   'B': [1, 2, 3, 4],
                   'C': [0.3, 0.2, 1.2, -0.5]})

es = ft.EntitySet()

es.entity_from_dataframe(entity_id="example",
                          index="id",
                          make_index=True,
                          dataframe=df)

es.normalize_entity(new_entity_id="a_entity",
                    base_entity_id="example",
                    index="A")

fm, fl = ft.dfs(target_entity="example",
                entityset=es,
                agg_primitives=["min"])

fm

this returns

    A  B    C  a_entity.MIN(example.B)  a_entity.MIN(example.C)
id                                                             
0   1  1  0.3                        1                      0.2
1   1  2  0.2                        1                      0.2
2   2  3  1.2                        3                     -0.5
3   2  4 -0.5                        3                     -0.5

If you don't want to create an extra table you could try using the cum_min primitive which calculate the cumulative after grouping by A

df = pd.DataFrame({'A': [1, 1, 2, 2],
                   'B': [1, 2, 3, 4],
                   'C': [0.3, 0.2, 1.2, -0.5]})

es = ft.EntitySet()

es.entity_from_dataframe(entity_id="example",
                          index="id",
                          make_index=True,
                          variable_types={
                              "A": ft.variable_types.Id
                          },
                          dataframe=df,)

fm, fl = ft.dfs(target_entity="example",
                entityset=es,
                groupby_trans_primitives=["cum_min"])

fm

this returns

    B    C  A  CUM_MIN(C) by A  CUM_MIN(B) by A
id                                             
0   1  0.3  1              0.3              1.0
1   2  0.2  1              0.2              1.0
2   3  1.2  2              1.2              3.0
3   4 -0.5  2             -0.5              3.0

How does df.groupby('A').agg('min') translate to featuretools?

Using Pandas:

Data

Group and Aggregate

Merge

An Attempt using featuretools:

Answers (1)

Related Questions

How does df.groupby(&#39;A&#39;).agg(&#39;min&#39;) translate to featuretools?

Using Pandas:

Data

Group and Aggregate

Merge

An Attempt using featuretools:

Answers (1)

Related Questions

How does df.groupby('A').agg('min') translate to featuretools?