Creating Entity Set in Featuretools error TypeError: 'str' object does not support item assignment

Question

I have this 3 dataframes:

df_train cortado:____________________ 
    SK_ID_CURR  TARGET  NAME_CONTRACT_TYPE_Cash loans  \
0      100002       1                              1   
1      100003       0                              1   
2      100004       0                              0   
3      100006       0                              1   
4      100007       0                              1   

   NAME_CONTRACT_TYPE_Revolving loans  CODE_GENDER_F  CODE_GENDER_M  
0                                   0              0              1  
1                                   0              1              0  
2                                   1              0              1  
3                                   0              1              0  
4                                   0              0              1  

df_bureau cortado:____________________ 
    SK_ID_CURR  SK_ID_BUREAU  CREDIT_ACTIVE_Active
0      100002       5714464                     1
1      100002       5714465                     1
2      215354       5714466                     1
3      215354       5714467                     1
4      215354       5714468                     1

bureau_balance cortado 3:____________________ 
    SK_ID_BUREAU  MONTHS_BALANCE  STATUS_C
0       5715448               0         1
1       5715448              -1         1
2       5715448              -2         1
3       5715448              -3         1
4       5715448              -4         1

And this is the script I am trying to run to feature synthesis:

entities = {
    "train"          : (df_train,         "SK_ID_CURR"),
    "bureau"         : (df_bureau,        "SK_ID_BUREAU"),
    "bureau_balance" : (df_bureau_balance,"MONTHS_BALANCE", "STATUS", "SK_ID_BUREAU")                       , 
    }

relationships = [
    ("bureau", "SK_ID_BUREAU", "bureau_balance", "SK_ID_BUREAU"),
    ("train", "SK_ID_CURR", "bureau", "SK_ID_CURR")
             ]

feature_matrix_customers, features_defs = ft.dfs(entities=entities,
                                             relationships=relationships,
                                             target_entity="train"
                                             )

But, whever I introduce the column "STATUS", this error happens: TypeError: 'str' object does not support item assignment

If I don't put the column "STATUS", it is ok with few rows of the dataframe. When the number of rows increases (and only putting STATUS as key would solve it), this other error happens: AssertionError: Index is not unique on dataframe (Entity bureau_balance)

Thanks in advance!!

willk · Accepted Answer

You are right in that the dataframes need a unique index to be made an entity. One simple option is to add a unique index to df_bureau_balance using

df_bureau_balance.reset_index(inplace = True)

and then making the entities:

entities = {
    "train"          : (df_train,         "SK_ID_CURR"),
    "bureau"         : (df_bureau,        "SK_ID_BUREAU"),
    "bureau_balance" : (df_bureau_balance, "index")
    }

A much better option is to use entitysets to represent your data. When we create an entity from df_bureau_balance, because it does not have a unique index, we pass in make_index = True and a name for the index (this can be any name provided it is not already a column in the data.) The rest is very similar to your work just with slightly different syntax! Here is a complete working example:

# Create the entityset
es = ft.EntitySet('customers')

# Add the entities to the entityset
es = es.entity_from_dataframe('train', df_train, index = 'SK_ID_CURR')
es = es.entity_from_dataframe('bureau', df_bureau, index = 'SK_ID_BUREAU')
es = es.entity_from_dataframe('bureau_balance', df_bureau_balance, 
                               make_index = True, index = 'bureau_balance_index')

# Define the relationships
r_train_bureau = ft.Relationship(es['train']['SK_ID_CURR'], es['bureau']['SK_ID_CURR'])
r_bureau_balance = ft.Relationship(es['bureau']['SK_ID_BUREAU'], 
                                   es['bureau_balance']['SK_ID_BUREAU'])

# Add the relationships
es = es.add_relationships([r_train_bureau, r_bureau_balance])

# Deep feature synthesis
feature_matrix_customers, feature_defs = ft.dfs(entityset=es, target_entity = 'train')

Entitysets help you keep track of all your data in a single structure! The Featuretools documentation is good for getting down the basics of using entitysets and I would recommend giving it a read.

Creating Entity Set in Featuretools error TypeError: 'str' object does not support item assignment

Answers (2)

Related Questions

Creating Entity Set in Featuretools error TypeError: &#39;str&#39; object does not support item assignment

Answers (2)

Related Questions

Creating Entity Set in Featuretools error TypeError: 'str' object does not support item assignment