Feature Store: Patterns for reusing the same features across different models

Question

I'm looking to use a feature store to optimize feature reuse across many different models.

Example: I have 10 different models that use the same 2 feature sets (e.g.: 2 datasets of features without labels). The main difference is that each model predicts a different set of labels.

I could not find any well-known pattern on the web, so I've come up with 3 different strategies, that don't really convince me.

Less Reusable but simple solution: Given each feature set, “replicate” it and create one group for each model, with its dedicated labels. With 2 feature sets and 10 models, we would have 20 different groups that share the same features, except for the labels.

More reusable but complex solution (a): Create only 2 feature groups, but include the labels for all the models. Then, when creating the dataset, filter the group to retrieve only the label column for the specific model trained. With 2 feature sets and 10 models, you would only have 2 groups, each one of them with 10 extra columns, one for each label.

More reusable but complex solution (b): Create 2 feature groups, plus a feature group for each label set. Then, when creating the dataset, select the “shared” feature group and the one that contains the label column for the specific model trained. With 2 feature sets and 10 models, you would have 12 groups; the 2 “shared” ones plus 10, each one of them corresponding to a label set.

I would be keen to use the second solution, but I'm not experienced enough to understand the potential risks (versioning, lineage, maintainability, etc..)

What do you think? Would you suggest a different approach?

For reference, I'm working on AWS, using SageMaker Feature Store.

Feature Store: Patterns for reusing the same features across different models

Answers (1)

Related Questions