In what form should I have my data for 1v1 match prediction?

Question

I'm currently working on an AI to predict the winner of a 1v1 match on a video game. The thing is that I do not know in what form I need to have the data (inputs and labels). For now I have the following data :

the day of the match (there is a day 0)
name of player 1
country of player 1
name of player 2
country of player 2
winner

I can also get the score of the match but sometimes it is in best of 3 and sometimes in best of 5 so I do not know if it could be reliable or not.

Based on the data I have, my two main questions are :

Is is possible that the AI predict two different results if I just reverse the player columns ?
If yes, how can I avoid it ?
How am I saying to the AI that the prediction I want is only one of the two players I present to it and not other players ?

Thanks in advance, I really appreiciate

Berkay Berabi · Accepted Answer

It seems that your data is categorical even though I do not exactly understand what you mean by player1 and player2. Do you have the names of the players or some skill set?

Neural networks or any AI algorithm work with numbers. They do not know anything about the real world such as day name(Monday, Tuesday etc.) or Country names. What you have to do is you have to create a mapping between these real-world issues and numbers.

They are something categorical(it can not take a continuous value) you can map the days from 0 to N. For the countries you can do the same, every country can have a unique ID. You have to be careful tough if, during inference the model receives a day or country that was not extant in the training, it will be unknown to the model. So either adds all the countries that are relevant or if you can not know this in prior, you can add a label -1 for the case of unknown country and day. For each feature, you will have a column and each row represents a match. In the column, you will have to correspondings IDs for that particular feature and match and you can pass this data to AI. By the way, it is okay that you use the same IDs/numbers for different features. (So you can you 1 for Tuesday and in the other Column 1 can be Switzerland) Answers to your questions:

Yes, in theory, this can happen. If you have enough samples and a good model the model itself might learn it.
If you can do, you can input relative values to the model instead of absolute values. So for example, if you have some skill set attributes/scores with respect to players, instead of feeding both scores to the system, you can create your data based on the difference of these score. E.g shooting for player1 is 80 and for player2 78. You have a column for shooting and there you put the value 80-78, then the modal knows the player1 is better by 2 or if vice versa you could put -2 and thne th emodel knows the player2 is better by 2 in that category. Another approach would be to have each match 2 times in the training data. The second one with player orders reversed. The model might also learn this from the data.
That is easy to do, your model will output not players IDs or anything related to the player. Your problem is a binary classification problem. Your model should output in any case either 0 or 1. 0 meaning player1 wins, 1 meaning player2 wins and then you can convert this output to the players by yourself.

In what form should I have my data for 1v1 match prediction?

Answers (1)

Related Questions