Creating Entity Set in Featuretools error TypeError: 'str' object does not support item assignment

I have this 3 dataframes:

df_train cortado:____________________ 
    SK_ID_CURR  TARGET  NAME_CONTRACT_TYPE_Cash loans  \
0      100002       1                              1   
1      100003       0                              1   
2      100004       0                              0   
3      100006       0                              1   
4      100007       0                              1   

   NAME_CONTRACT_TYPE_Revolving loans  CODE_GENDER_F  CODE_GENDER_M  
0                                   0              0              1  
1                                   0              1              0  
2                                   1              0              1  
3                                   0              1              0  
4                                   0              0              1  

df_bureau cortado:____________________ 
    SK_ID_CURR  SK_ID_BUREAU  CREDIT_ACTIVE_Active
0      100002       5714464                     1
1      100002       5714465                     1
2      215354       5714466                     1
3      215354       5714467                     1
4      215354       5714468                     1

bureau_balance cortado 3:____________________ 
    SK_ID_BUREAU  MONTHS_BALANCE  STATUS_C
0       5715448               0         1
1       5715448              -1         1
2       5715448              -2         1
3       5715448              -3         1
4       5715448              -4         1 

And this is the script I am trying to run to feature synthesis:

entities = {
    "train"          : (df_train,         "SK_ID_CURR"),
    "bureau"         : (df_bureau,        "SK_ID_BUREAU"),
    "bureau_balance" : (df_bureau_balance,"MONTHS_BALANCE", "STATUS", "SK_ID_BUREAU")                       , 
    }

relationships = [
    ("bureau", "SK_ID_BUREAU", "bureau_balance", "SK_ID_BUREAU"),
    ("train", "SK_ID_CURR", "bureau", "SK_ID_CURR")
             ]

feature_matrix_customers, features_defs = ft.dfs(entities=entities,
                                             relationships=relationships,
                                             target_entity="train"
                                             )

But, whever I introduce the column "STATUS", this error happens: TypeError: 'str' object does not support item assignment

If I don't put the column "STATUS", it is ok with few rows of the dataframe. When the number of rows increases (and only putting STATUS as key would solve it), this other error happens: AssertionError: Index is not unique on dataframe (Entity bureau_balance)

Thanks in advance!!

Upvotes: 4

Views: 822

Answers (2)

Max Kanter
Max Kanter

Reputation: 2014

caseWestern's answer is the recommended way to create an EntitySet in Featuretools.

That being said, the error you are seeing is because Featuretools is expecting the 4 values for the entity to be where variable types is a dictionary dict[str -> Variable]. Right now, you are only passing in a string for the 4th parameter, so Featuretools fails when tries to add entries because it isn't actually a dictionary.

You can see the documentation for Entity Set for more information.

Upvotes: 0

willk
willk

Reputation: 3827

You are right in that the dataframes need a unique index to be made an entity. One simple option is to add a unique index to df_bureau_balance using

df_bureau_balance.reset_index(inplace = True)

and then making the entities:

entities = {
    "train"          : (df_train,         "SK_ID_CURR"),
    "bureau"         : (df_bureau,        "SK_ID_BUREAU"),
    "bureau_balance" : (df_bureau_balance, "index")
    }

A much better option is to use entitysets to represent your data. When we create an entity from df_bureau_balance, because it does not have a unique index, we pass in make_index = True and a name for the index (this can be any name provided it is not already a column in the data.) The rest is very similar to your work just with slightly different syntax! Here is a complete working example:

# Create the entityset
es = ft.EntitySet('customers')

# Add the entities to the entityset
es = es.entity_from_dataframe('train', df_train, index = 'SK_ID_CURR')
es = es.entity_from_dataframe('bureau', df_bureau, index = 'SK_ID_BUREAU')
es = es.entity_from_dataframe('bureau_balance', df_bureau_balance, 
                               make_index = True, index = 'bureau_balance_index')

# Define the relationships
r_train_bureau = ft.Relationship(es['train']['SK_ID_CURR'], es['bureau']['SK_ID_CURR'])
r_bureau_balance = ft.Relationship(es['bureau']['SK_ID_BUREAU'], 
                                   es['bureau_balance']['SK_ID_BUREAU'])

# Add the relationships
es = es.add_relationships([r_train_bureau, r_bureau_balance])

# Deep feature synthesis
feature_matrix_customers, feature_defs = ft.dfs(entityset=es, target_entity = 'train')

Entitysets help you keep track of all your data in a single structure! The Featuretools documentation is good for getting down the basics of using entitysets and I would recommend giving it a read.

Upvotes: 4

Related Questions