Reputation: 110
I want to keep one column of my dataframe in its original state, not applying any primitive to it, is it possible?
Upvotes: 1
Views: 392
Reputation: 2014
Yes, you can do this with the ignore_variables
parameter to ft.dfs
. Here's an example on a demo entity set.
import featuretools as ft
es = ft.demo.load_mock_customer(return_entityset=True)
es.plot()
if we want to build features for the sessions entity, but ignore the device
variable, we can run
feature_defs = ft.dfs(target_entity="sessions",
entityset=es,
agg_primitives=["count", "mode"],
trans_primitives=[],
ignore_variables={"sessions": ["device"]},
features_only=True)
feature_defs
has the following features
[<Feature: customer_id>,
<Feature: COUNT(transactions)>,
<Feature: MODE(transactions.product_id)>,
<Feature: customers.zip_code>,
<Feature: MODE(transactions.products.brand)>,
<Feature: customers.COUNT(sessions)>,
<Feature: customers.COUNT(transactions)>,
<Feature: customers.MODE(transactions.product_id)>]
this creates features using the count
and mode
primitives, but ignores the device variable in the sessions entity. if we want to include the device variable in its original state we can add it back in like this
feature_defs += [ft.Feature(es["sessions"]["device"])]
Now, we can calculate the feature matrix. device
is now at the end
fm = ft.calculate_feature_matrix(features=feature_defs, entityset=es)
fm
customer_id COUNT(transactions) MODE(transactions.product_id) customers.zip_code ... customers.COUNT(sessions) customers.COUNT(transactions) customers.MODE(transactions.product_id) device
session_id ...
1 2 16 3 13244 ... 7 93 4 desktop
2 5 10 5 60091 ... 6 79 5 mobile
3 4 15 1 60091 ... 8 109 2 mobile
4 1 25 5 60091 ... 8 126 4 mobile
5 4 11 5 60091 ... 8 109 2 mobile
6 1 15 4 60091 ... 8 126 4 tablet
7 3 15 1 13244 ... 6 93 1 tablet
8 4 18 1 60091 ... 8 109 2 tablet
9 1 15 1 60091 ... 8 126 4 desktop
10 2 15 2 13244 ... 7 93 4 tablet
11 4 15 3 60091 ... 8 109 2 mobile
12 4 10 4 60091 ... 8 109 2 desktop
13 4 12 2 60091 ... 8 109 2 mobile
14 1 12 4 60091 ... 8 126 4 tablet
15 2 8 2 13244 ... 7 93 4 desktop
16 2 10 4 13244 ... 7 93 4 desktop
17 2 13 1 13244 ... 7 93 4 tablet
18 1 12 2 60091 ... 8 126 4 desktop
19 3 17 1 13244 ... 6 93 1 desktop
20 5 15 1 60091 ... 6 79 5 desktop
21 4 18 5 60091 ... 8 109 2 desktop
22 4 10 2 60091 ... 8 109 2 desktop
23 3 11 3 13244 ... 6 93 1 desktop
24 5 14 4 60091 ... 6 79 5 tablet
25 3 16 1 13244 ... 6 93 1 desktop
26 1 16 1 60091 ... 8 126 4 tablet
27 1 15 5 60091 ... 8 126 4 mobile
28 5 18 2 60091 ... 6 79 5 mobile
29 1 16 4 60091 ... 8 126 4 mobile
30 5 14 3 60091 ... 6 79 5 desktop
31 2 18 3 13244 ... 7 93 4 mobile
32 5 8 3 60091 ... 6 79 5 mobile
33 2 13 3 13244 ... 7 93 4 mobile
34 3 18 4 13244 ... 6 93 1 desktop
35 3 16 5 13244 ... 6 93 1 mobile
As a sanity check, this is what the output is if we don't use ignore_variables
feature_defs = ft.dfs(target_entity="sessions",
entityset=es,
agg_primitives=["count", "mode"],
trans_primitives=[],
features_only=True)
you can see the feature <Feature: customers.MODE(sessions.device)>
gets created now
[<Feature: customer_id>,
<Feature: device>,
<Feature: COUNT(transactions)>,
<Feature: MODE(transactions.product_id)>,
<Feature: customers.zip_code>,
<Feature: MODE(transactions.products.brand)>,
<Feature: customers.COUNT(sessions)>,
<Feature: customers.MODE(sessions.device)>,
<Feature: customers.COUNT(transactions)>,
<Feature: customers.MODE(transactions.product_id)>]
Upvotes: 3