Using training data and testing data in a shared task

Question

I am working on this shared task http://alt.qcri.org/semeval2017/task4/index.php?id=data-and-tools

which is just a twitter sentiment analysis. Since i am pretty new to machine learning, I am not quite sure how to use both training data and testing data.

So the shared task provides two same sets of twitter tweets one without the result (train) and one with the result.

I current understandings of using these kinds of data in machine learning are as follows:

training set: we are supposed to split this into training and testing portions (90% training and 10% testing maybe?)

But the existing of a separate test data kind of confuses.

Are we supposed to use the result that we got in the test using the 10% portion of the 'training set' and compare that to the actual result 'testing set' ?

Can someone correct my understanding?

Flika205 · Accepted Answer

When training a machine learning model, you are feeding your algorithm with the dataset called training set, which in this stage, you are telling the algorithm what is the ground truth of each sample you put into the algorithm, that way, the algorithm learns from each sample you are feeding to it. the training set is usually 80% of the whole dataset, the other 20% of the dataset is the testing set, which in this case, you know what is the ground truth of each sample, but you let your algorithm predict what it think the truth is to each sample you let it predict. All those prediction over the testing set are based on what the algorithm have learned from the training set you fed it before. After you make all the predictions over your testing set you can then check how accurate your model is based on the ground truth in compare to the prediction the model have made.

Using training data and testing data in a shared task

Answers (1)

Related Questions