Reputation: 18705
I'm learning machine learning and my dataset consists from 6 columns:
home_team, away_team, home_odds, away_odds, home_score, away_score, 1_if_home_wins_else_0
To be able to feed the Tensorflow with teams, I converted every team to integer so the first two columns are integers (like database ids)
There is a 10k rows in the csv.
example
Now I'm trying to modify the code for pima indians diabetes to predict winnings of home team.
But it returns the same prediction (0) to any input. When I tried it only on odds, the predictions were more accurate and not the same at least.
code
# load the dataset
dataset = loadtxt('football_data.csv', delimiter=',')
# split into input (X) and output (y) variables
X = dataset[:, 0:4]
y = dataset[:, 6]
# define the keras model
model = Sequential()
model.add(Dense(12, input_dim=4, activation='relu'))
model.add(Dense(8, activation='relu'))
model.add(Dense(1, activation='sigmoid'))
# compile the keras model
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
# fit the keras model on the dataset
model.fit(X, y, epochs=150, batch_size=10)
# evaluate the keras model
_, accuracy = model.evaluate(X, y)
print('Accuracy: %.2f' % (accuracy * 100))
# make class predictions with the model
predictions = model.predict_classes(X)
# summarize the first 5 cases
for i in range(50):
print('%s => %d (expected %d)' % (X[i].tolist(), predictions[i], y[i]))
Do you know where is the problem?
Upvotes: 0
Views: 161
Reputation: 775
The fact that you convert the first two columns (team names) to integers does not make any sense. This way you would be implying that teams that have similar IDs, such as 1146
and 1179
, will perform similar and that teams with completely different IDs, such as 4
and 6542
, would perform very differently. Usually this kind of data would be presented in a different manner or even excluded from the dataset.
I would exclude those columns in this case since the odds seem to contain all necessary data, I wouldn't even use neural networks for this but just compare the odds. However I understand that you want to use a simple dataset for learning purposes in which case only using the odds would be fine.
Mind though that the neural network will probably learn to assign the win to the team with the biggest odds of winning, like the following:
if home_odds > away_odds:
one_if_home_wins_else_zero = 1
else:
one_if_home_wins_else_zero = 0
Upvotes: 1