Reputation: 33
Suppose that I have a dataset and build a ML model. This dataset is updated weekly and, after that, I want to, when he updated, my model predict for new rows that appears and append it to original dataset. How I made this?
This what I tried:
import pandas as pd
import numpy as np
import sklearn
from sklearn.model_selection import train_test_split
from sklearn.model_selection import cross_val_score
from sklearn.svm import SVC
url = "https://raw.githubusercontent.com/jbrownlee/Datasets/master/iris.csv"
names = ['sepal-length', 'sepal-width', 'petal-length', 'petal-width', 'class']
df = pd.read_csv(url, names=names)
df
array = df.values
X = array[:,0:4]
y = array[:,4]
X_train, X_validation, Y_train, Y_validation = train_test_split(X, y, test_size=0.20, random_state=1)
I skip some steps where I check the score for different models.
model = SVC(gamma='auto')
model.fit(X_train, Y_train)
predictions = model.predict(X_validation)
Here I add new data to make my test:
new_data = [[5.9, 3.0, 5.7, 1.5], [4.8, 2.9, 3.0, 1.2]]
df2 = pd.DataFrame(new_data, columns = ["sepal-length", "sepal-width", "petal-length", "petal-width"])
df3 = df.append(df2, ignore_index=True)
df3
array2 = df3.values
X2 = array2[:,0:4]
predict = model.predict(X2)
predict
df3['pred'] = predict
def final_class(row):
if pd.isnull(row['class']):
return row['pred']
else:
return row['class']
df3['final_class'] = df3.apply(lambda x: final_class(x), axis=1)
df3
Works, but I think that is not the best way to do it. Can someone help me?
Upvotes: 0
Views: 658
Reputation: 111
It's the right way.
Also you can do like, predict on new dataset only & append the predicted result to initially predicted dataset.
Upvotes: 1