farynaa
farynaa

Reputation: 370

sklearn prepare dataset for game winner prediction

I want to predict the result of the match based on the results of previous matches. For each match I have this data: ids of team1 players, ids of team2 players, weapon ids of team1 players and weapon ids of team2 players. For example:

{
  "team1_ids": [
    12321323,
    1421242,
    54325235
  ],
  "team2_ids": [
    55432453,
    242462,
    2234444
  ],
  "team1_weapon_ids": [
    1,
    3,
    5
  ],
  "team2_weapon_ids": [
    2,
    4,
    6
  ]
}

Same records I have for other matches. Total I have about 30 different player ids and only 6 kinds of weapons that are unique for each player in a match.

Is there any simple way to prepare the dataset for further sklearn classification? I was looking into different sklearn label preprocessings but haven't find the anwser.

It seems, some sort of sklearn OneHotEncoder is suitable, but it does not take into account that switching positions of player ids in a team doesn't matter for game result. For y values I use binary labels: 1 if team1 win and -1 if team2 win.

Upvotes: 2

Views: 273

Answers (1)

Dimgold
Dimgold

Reputation: 2944

As far as I see - you need to encode only the amount of each weapon type that was used by each team.

Therefore I would describe the match records as 6 features per team (amount of usage of each weapon) and 1 label column.

For example:

team1_weapon1 |...| team1_weapon6 | team2_weapon1 |...| team2_weapon6 | Result |
    1          ...         1             0         ...       1           -1
    0          ...         0             1         ...       1            1

Where each team[i]_weapon[j] holds the amount (or binary flag if its unique per game) of weapons of j-type of i-team and Result is the game outcome.

Upvotes: 1

Related Questions