Reputation: 117
I am trying to work on an exercise to build a simple machine learning algorithm in python. It is regarding a dataframe which content is a sample of pokemon battles and winners. What I am trying to do: I have a dataset with more than 50000 samples of pokemon battles and I want to count how many times each of them won their battles and I want to insert that number in a table which contain some data of each pokemon. The code is (in Jupyter notebook):
import pandas as pd
import numpy as np
pokemon = pd.read_csv('datas/pokemons_data.csv')
combates = pd.read_csv('datas/combats.csv')
pokemon
nome_corrigido = dict(zip(pokemon['#'], pokemon['Name']))
combates = combates[['First_pokemon', 'Second_pokemon',
'Winner']].replace(nome_corrigido)
combates
primeiro = combates['First_pokemon'].value_counts()
segundo = combates['Second_pokemon'].value_counts()
vitorias = combates['Winner'].value_counts()
total_de_batalhas = primeiro + segundo
percentual_vitorias = vitorias/total_de_batalhas
percentual_vitorias = percentual_vitorias.sort_values()
percentual_vitorias.head()
vitorias.head()
pokemon['status_total'] = pokemon['Hit Points'] + pokemon['Attack'] +
pokemon['Defense'] + pokemon['Sp. Atk'] + \
pokemon['Sp. Def'] + pokemon['Speed']
pokemon['vitorias'] = vitorias[0]
pokemon['percentual_vitorias'] = percentual_vitorias[0]
pokemon.iloc[:, [1, -3]].head()
from sklearn.model_selection import train_test_split
x = pokemon['status_total']
y = pokemon['percentual_vitorias']
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size = 0.25)
from sklearn.linear_model import LinearRegression
x_train = np.array(x_train).reshape(len(x_train) , 1)
y_train = np.array(y_train).reshape(len(y_train) , 1)
y_test = np.array(y_test).reshape(len(y_test) , 1)
x_test = np.array(x_test).reshape(len(x_test) , 1)
modelo_linear = LinearRegression()
modelo_linear.fit(x_train, y_train)
LinearRegression(copy_X=True, fit_intercept=True, n_jobs=1, normalize=False)
modelo_linear.predict(x)
Questions: 1) The first problems happens in the line 'pokemon['vitorias'] = vitorias[0]' It is clear that the new column will be filled only with the value "152" which is the first entry in the "vitorias" array (without the [0] I will got a NaN because I am mapping an entire vector to a cell in the dataset). What I intended to do: in the dataset combats.csv there is a line with a column that is a pokemon, the second is another pokemon and the third one is the winner of this battle. I counted the number of times that some pokemon won and created this array which each entry is the number of wons sorted from the greatest number to the last one. 152 is the number of times mewtwo won his battles. Now I want to fill the new column 'vitorias' with this number 152 only for mewtwo, and the number of victories for the other pokemon corresponding to how many times they won. I do not know how to do it. My problem, my big problem, is to map, for example, the 152 which is the first entry in the vitorias vector to the corresponding line where 'mewtwo' is in the other dataset, the 'pokemon' dataset. That is, to map the number of victories of a pokemon in the vitorias vector to the corresponding entry in the pokemon table.
2) in the line modelo_linear.predict(x) I got: expected 2D array, got 1D array instead" Why it was expecting a 2D array? How can I fix it?
This is the pokemons_data.csv:
I am trying to insert the number of wins of, say, bulbasaur, in a new column aside the "type2" column. So I go in this datframe, the combats.csv:
and try to count the number of wins of each pokemon, identified by the ID (numbers 1 for bulbasaur, 2 for ivysaur, etc...).
Upvotes: 0
Views: 65
Reputation: 33970
Use a pd.join/pd.merge()
between combates
and pokemon
so you can get the Names for First_/Second_pokemon
. The rest will be easy.
Upvotes: 1