Reputation: 569
I am trying to create a new variable "spg", or shots per game using Pandas on Python. This new variable is simply s/gp or shots/games played. I would like to add this new variable to my current dataframe so I can print it to excel.
I would like my final columns to look like this: columns = ("name", "team", "gp", "g", "s", "spg")
# import modules
from urllib.request import urlopen
from lxml.html import fromstring
import pandas as pd
# connect to url
url = "https://www.hockey-reference.com/leagues/NHL_2020_skaters.html"
# remove HTML comment markup
content = str(urlopen(url).read())
comment = content.replace("-->", "").replace("<!--", "")
tree = fromstring(comment)
# setting up excel columns
columns = ("name", "team", "gp", "g", "s")
df = pd.DataFrame(columns=columns)
# parsing data while using loop
for idx, row in enumerate(tree.xpath('//table[@id="stats"]/tbody/tr[not(@class="thead")]')):
name = row.xpath('.//td[@data-stat="player"]')[0].text_content()
team = row.xpath('.//td[@data-stat="team_id"]')[0].text_content()
gp = row.xpath('.//td[@data-stat="games_played"]')[0].text_content()
goals = row.xpath('.//td[@data-stat="goals"]')[0].text_content()
s = row.xpath('.//td[@data-stat="shots"]')[0].text_content()
# create pandas dataframe to export data to excel.
df.loc[idx] = (name, team, gp, goals, s) # I would like to add "spg"
print(df)
# write data to excel
writer = pd.ExcelWriter('NHL skater20.xlsx')
df.to_excel(writer, 'Sheet1')
writer.save()
Any recommendations on the best way to do this? I greatly appreciate any help or feedback I can get. Thanks in advance!
Upvotes: 0
Views: 536
Reputation: 15738
Simple: after the loop, add this line:
df['spg'] = df['s'] / df['gp']
UPD: (tested and working)
# import modules
from urllib.request import urlopen
from lxml.html import fromstring
import pandas as pd
# connect to url
url = "https://www.hockey-reference.com/leagues/NHL_2020_skaters.html"
# remove HTML comment markup
content = str(urlopen(url).read())
comment = content.replace("-->", "").replace("<!--", "")
tree = fromstring(comment)
# setting up excel columns
columns = ("name", "team", "gp", "g", "s")
df = pd.DataFrame(columns=columns)
# parsing data while using loop
for idx, row in enumerate(tree.xpath('//table[@id="stats"]/tbody/tr[not(@class="thead")]')):
name = row.xpath('.//td[@data-stat="player"]')[0].text_content()
team = row.xpath('.//td[@data-stat="team_id"]')[0].text_content()
gp = row.xpath('.//td[@data-stat="games_played"]')[0].text_content()
goals = row.xpath('.//td[@data-stat="goals"]')[0].text_content()
s = row.xpath('.//td[@data-stat="shots"]')[0].text_content()
# create pandas dataframe to export data to excel.
# this is fine for a small dataset, however
# for a bigger one writing a generator is much more efficient
df.loc[idx] = (name, team, gp, goals, s) # I would like to add "spg"
# print(df) # commented out as unnecessary
# at this point these columns are .text_content(), converting to numbers
df = df.astype({'gp': int, 'g': int, 's': int})
# main part: adding a new column
df['spg'] = df['s'] / df['gp']
# write data to excel
writer = pd.ExcelWriter('NHL skater20.xlsx')
df.to_excel(writer, 'Sheet1')
writer.save()
Upvotes: 1
Reputation: 598
To send it to excel you can use either of these:
DataFrame.to_excel('filename.xlsx')
DataFrame.to_csv('filename.csv')
Upvotes: 1