Reputation: 5
So I need to go through a csv file containing information about certain video games, and create a new variable based on the user scores of the game here is my code:
#Imports
import pandas
import numpy as np
import matplotlib.pyplot as plt
data = pandas.read_csv("Data Collections/metacritic_games_2016_11.csv", encoding='latin-1')
data['year'] = pandas.DatetimeIndex(data['release']).year
data = data[data["year"] >= 2000]
rating = []
for index, row in data.iterrows():
if row['user_score'] >= 7.5:
rating.append("Good")
elif row['user_score'] >= 6.5:
rating.append("Average")
elif row['user_score'] >= 0:
rating.append("Bad")
data["new_rating"] = pandas.Series(rating)
year = 2000
index = 0
while year != 2016:
vals = data[data["year"] == year]["new_rating"].value_counts()
plt.bar(index, vals["Bad"], color='#494953')
plt.bar(index, vals["Average"], color='#6A7EFC', bottom=vals["Bad"])
plt.bar(index, vals["Good"], color='#FF5656', bottom=vals["Average"] + vals["Bad"])
index += 1
year += 1
plt.show()
However I keep getting error saying:
if row['user_score'] >= 7.5:
TypeError: '>=' not supported between instances of 'str' and 'float'
I'm not sure what to do here. Any help is appreciated
Upvotes: 0
Views: 179
Reputation: 8954
one of the numbers in your user_score
column is considered a string for some reason. Presuming it's not a value like "seventeen"
, you can fix that with
data['user_score'] = data['user_score'].astype(float)
I would also suggest replacing the code you have for creating your rating
column. Instead of this:
rating = []
for index, row in data.iterrows():
if row['user_score'] >= 7.5:
rating.append("Good")
elif row['user_score'] >= 6.5:
rating.append("Average")
elif row['user_score'] >= 0:
rating.append("Bad")
data["new_rating"] = pandas.Series(rating)
you should do something like this:
group_boundaries = [0, 6.5, 7.5, inf]
group_labels = ['bad', 'average', 'good']
data['rating'] = pd.cut(data['user_score'],
bins = group_boundaries,
labels=group_labels)
Upvotes: 2