shorooq
shorooq

Reputation: 11

How to use sklearn pre-processors

i have a data set with three columns, i want to apply svm machine learning algorithm, but i dont know what wrong in my code

i wrote this code

tfidf_vectorizer = TfidfVectorizer()
attack_data = pd.DataFrame(attack_data, columns = ['payload', 'label', 'attack_type'])
tf_train_data = pd.concat([attack_data['payload'], attack_data['attack_type']])
trained_tf_idf_transformer = tfidf_vectorizer.fit_transform(tf_train_data)
attack_data['tf_idf_payload'] = trained_tf_idf_transformer.transform(attack_data['payload'])
attack_data['tf_idf_attack_type'] = trained_tf_idf_transformer.transform(attack_data['attack_type'])
data_for_model = attack_data[['tf_idf_payload', 'tf_idf_attack_type', 'label']]
x = data_for_model[['tf_idf_payload', 'tf_idf_attack_type']].as_matrix()
y = data_for_model['label'].as_matrix()
with open ("x_result.pkl",'wb') as handls:
        p.dump(trained_tf_idf_transformer,handls)

this error arise : attack_data['tf_idf_payload'] = trained_tf_idf_transformer.transform(attack_data['payload'])

File "C:\Users\me\Anaconda3\lib\site-packages\scipy\sparse\base.py", line 686, in getattr raise AttributeError(attr + " not found")

AttributeError: transform not found

Upvotes: 1

Views: 69

Answers (1)

Corentin Limier
Corentin Limier

Reputation: 5006

That's because fit_transform does not return the fit transformer, it returns the transformed data.

trained_tf_idf_transformer = tfidf_vectorizer.fit_transform(tf_train_data)
attack_data['tf_idf_payload'] = trained_tf_idf_transformer.transform(attack_data['payload'])

is wrong and should be :

tf_train_data_transformed = tfidf_vectorizer.fit_transform(tf_train_data)
attack_data['tf_idf_payload'] = tfidf_vectorizer.transform(attack_data['payload'])

See that you can use the same object tfidf_vectorizer to transform your other data (it has been updated when you trained it).

I cannot use your example as it is not reproducible and I'm a bit lazy to understand all the steps, but look at this one :

import pandas as pd
from sklearn.preprocessing import StandardScaler

df_train = pd.DataFrame({'data': [1,2,3]})
df_validation = pd.DataFrame({'data': [1,2,3]})

scaler = StandardScaler()
scaler_trained = scaler.fit_transform(df)
df_validation_transformed = scaler_trained.transform(df_validation)

raises the same error.

This code works :

import pandas as pd
from sklearn.preprocessing import StandardScaler

df_train = pd.DataFrame({'data': [1,2,3]})
df_validation = pd.DataFrame({'data': [1,2,3]})

scaler = StandardScaler()
df_train_transformed = scaler.fit_transform(df)
df_validation_transformed = scaler.transform(df_validation)

You just need to follow the same logic.

Upvotes: 1

Related Questions