Why the Constructor for Sklearn Transformer within ColumnTransformer is invoked twice, further, the parameters for two invocations are different

Question

Three questions for below code and its output:

Why the constructor for MyDebug Transformer being invoked twice, first time for line 26, and second time for line 37?
Why the two invocations show different parameter myname, especially weird for second invocation for line 37, why it doesn't take in the passed parameter, not even default value, but None instead as in the output?
If you uncomment line 36, ct1.fit, it also invokes Transformer's transform function, which is only expected for ct1.fit_transform?

Environment: Python version is 3.6.10 and Sklearn version is 0.22.1

  1 import numpy as np
  2 from sklearn.compose import ColumnTransformer
  3 from sklearn.preprocessing import Normalizer
  4 from sklearn.base import BaseEstimator,TransformerMixin
  5 from sklearn.pipeline import Pipeline
  6 from datetime import datetime
  7
  8
  9
 10 class MyDebug(BaseEstimator, TransformerMixin):
 11     def __init__(self, myname="HELP"):
 12         print(f"intialized with myname: {myname}")
 13         self._name = myname
 14         print (f"Debug.__init__ being invoked for {myname}, {self._name}, {id(self)}")
 15     def transform(self, X):
 16         print (f"in {self._name} transform with type: {type(X)}, shape: {X.shape} at {datetime.now()}")
 17         self.shape = X.shape
 18         # what other output you want
 19         return X
 20     def fit(self, X, y=None, **fit_params):
 21         print (f"in {self._name} fit with type: {type(X)}, shape: {X.shape} at {datetime.now()}")
 22         return self
 23
 24
 25 print("************************************************************")
 26 ct1 = ColumnTransformer(
 27     [("norm1", Pipeline(steps=[("norm", Normalizer(norm='l1')), ("debug", MyDebug("MYDEBUG_1"))]), [0, 1]),
 28      ("norm2", Pipeline(steps=[("norm", Normalizer(norm='l1')), ("debug", MyDebug("MYDEBUG_2"))]), slice(2, 10))])
 29
 30 print("************************************************************")
 31 print(f"id(ct1): {id(ct1)}")
 32 X = np.array([[0., 1., 2., 2., 0., 1., 2., 2.],
 33               [1., 1., 0., 1., 1., 1., 0., 1.]])
 34
 35 print("************************************************************")
 36 # ret = ct1.fit(X)
 37 ret = ct1.fit_transform(X)
 38 print("************************************************************")
 39 print(f"id(ct1): {id(ct1)}")
 40 print(f"type(ret): {type(ret)}")
 41 print(type(ct1.named_transformers_["norm1"]), id(ct1.named_transformers_["norm1"]), id(ct1.named_transformers_["norm2"]), "
",
 42 type(ct1.named_transformers_["norm1"].named_steps["norm"]), id(ct1.named_transformers_["norm1"].named_steps["norm"]), id(ct1.named_transformers_["norm2"].named_steps["norm"]), "
",
 43 type(ct1.named_transformers_["norm1"].named_steps["debug"]), id(ct1.named_transformers_["norm1"].named_steps["debug"]), id(ct1.named_transformers_["norm2"].named_steps["debug"]))

Output:

************************************************************
intialized with myname: MYDEBUG_1
Debug.__init__ being invoked for **MYDEBUG_1, MYDEBUG_1**, 140118618819160
intialized with myname: MYDEBUG_2
Debug.__init__ being invoked for **MYDEBUG_2, MYDEBUG_2**, 140118618819216
************************************************************
id(ct1): 140118618819328
************************************************************
intialized with myname: None
Debug.__init__ being invoked for **None, None**, 140118618819944
in None fit with type: , shape: (2, 2) at 2021-03-24 00:45:41.850603
in None transform with type: , shape: (2, 2) at 2021-03-24 00:45:41.851159
intialized with myname: None
Debug.__init__ being invoked for **None, None**, 140118618820392
in None fit with type: , shape: (2, 6) at 2021-03-24 00:45:41.852955
in None transform with type: , shape: (2, 6) at 2021-03-24 00:45:41.852995
************************************************************
id(ct1): 140118618819328
type(ret): 
 140118618819776 140118618820000 
  140118618819888 140118618820112 
  140118618819944 140118618820392

Why the Constructor for Sklearn Transformer within ColumnTransformer is invoked twice, further, the parameters for two invocations are different

Answers (1)

Related Questions