Reputation: 4647
I have a simple operation on pandas dataframe like this:
# initialization
dct = {1: 'A', 2:'B', 3: 'C'}
df = pd.DataFrame({'id': [1,2,3], 'value':[7,8,9]})
# actual transformation
df['newid'] = df.id.map(dct)
And I would like to put this transformation as a part of a sklearn pipeline. I found a few tutorials here, here, and here. But I just can't get it work for me. Here's one version of many versions I have tried:
# initialization
dct = {1: 'A', 2:'B', 3: 'C'}
df = pd.DataFrame({'id': [1,2,3], 'value':[7,8,9]})
# define a class similar to those in the tutorials
class idMapper(BaseEstimator, TransformerMixin):
def __init__(self, key='id'):
self.key = key
def fit(self, X, y=None):
return self
def transform(self, X):
return X[key].map(dct)
# Apply the transformation
idMapper.fit_transform(df)
The error message is like this: TypeError: fit_transform() missing 1 required positional argument: 'X'
. Can anyone help me fix this issue and get it working? Thanks!
Upvotes: 0
Views: 415
Reputation: 4150
See below a corrected version of your code. Explanation given in the comments.
dct = {1: 'A', 2:'B', 3: 'C'}
df = pd.DataFrame({'id': [1,2,3], 'value':[7,8,9]})
# define a class similar to those in the tutorials
class idMapper(BaseEstimator, TransformerMixin):
def __init__(self, key='id'):
self.key = key
def fit(self, X, y=None):
return self
def transform(self, X):
return X[self.key].map(dct) # <--- self.key
# Apply the transformation
idMapper().fit_transform(df) # <--- need to instantiate
Upvotes: 3