Reputation: 8297
I am working on this shared task http://alt.qcri.org/semeval2017/task4/index.php?id=data-and-tools
which is just a twitter sentiment analysis. Since i am pretty new to machine learning, I am not quite sure how to use both training data and testing data.
So the shared task provides two same sets of twitter tweets one without the result (train) and one with the result.
I current understandings of using these kinds of data in machine learning are as follows:
But the existing of a separate test data kind of confuses.
Are we supposed to use the result that we got in the test using the 10% portion of the 'training set' and compare that to the actual result 'testing set' ?
Can someone correct my understanding?
Upvotes: 0
Views: 66
Reputation: 562
When training a machine learning model, you are feeding your algorithm with the dataset called training set
, which in this stage, you are telling the algorithm what is the ground truth of each sample you put into the algorithm, that way, the algorithm learns from each sample you are feeding to it. the training set
is usually 80% of the whole dataset, the other 20% of the dataset is the testing set
, which in this case, you know what is the ground truth of each sample, but you let your algorithm predict what it think the truth is to each sample you let it predict. All those prediction over the testing set
are based on what the algorithm have learned from the training set
you fed it before.
After you make all the predictions over your testing set
you can then check how accurate your model is based on the ground truth in compare to the prediction the model have made.
Upvotes: 3