NEETHI N Nambiar
NEETHI N Nambiar

Reputation: 55

What do the data analytics data set train and test variables represent?

Within the below code there are a few variables I'm confused about:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn import svm,metrics,datasets   

train_data=np.zeros((280,10304))
train_target=np.zeros((280))
test_data=np.zeros((120,10304))
test_target=np.zeros((120))

Can someone please explain what test_data, train_data, test_target and train_target represent and their purpose?

Upvotes: 0

Views: 25

Answers (1)

HGLR
HGLR

Reputation: 366

That's a quite weird way of naming what's commonly named:
- X_train (here train_data): inputs of your model used to train
- Y_train (here train_target): labels of the lines used to train, i.e. what your model learns to predict
- X_test (here test_data): inputs of your model used to test
- Y_test (here test_target): what you want your model to predict while testing your model

To "test" a model signify mostly to compute some metrics (accuracy/recall/...) to determine how much you are satisfied of your model once that it's trained.

Note: lines of input must have same length, and you must have the same number of lines in input and in labels when training or testing.

Upvotes: 1

Related Questions