Reputation: 13
new to python and have a problem to solve that I've hit a roadblock. Looking to calculate the monthly and yearly average price for Google and tell us the best and worst six months and the best and worst six years for Google from 2004 to Oct 2019 The average price is defined as ((v1*c1)+(v2*c2)+(v3*c3)+(v4*c4)...+(vn*cn)) / (v1+v2+v3+v4...+vn) where vi is the volume for day i and ci is the adjusted close price for day i.
I've been able to import the data from the web but now I'm trying to at least get one year into a tuple and then work out from that but can't figure out how to convert the data I have into a tuple. I try function = tuple but get errors. I know how to make a tuple but not how to make one from the data I have or if I'm missing a step to get the data into one or if I'm going off track with this approach
Here is my code so far if anyone could point me in the right direction it would be appreciated
# url = "http://193.1.33.31:88/pa1/GOOGL.csv"
import csv
import begin
from read_from_file_or_net import get_stuff_from_net as gn
def main(csv_file: 'URL of CSV file'):
# def main(csv_file):
try:
print(f"{csv_file}")
my_file = gn(csv_file)
# with open(f".cache/{my_file}", "w") as output:
# output.write(my_file)
my_file = my_file.split("\n")
for row in my_file:
row = row.strip().split(",")
for cell in row:
if cell.isalpha():
print(f"{cell}"+"", end="")
elif "-" in cell:
print(f"{cell}", end="")
elif "." in cell:
print(f"{float(cell):>10.2f}", end="")
elif cell.isnumeric():
print(f"{int(cell):>15d}", end="")
elif not cell.isspace():
print(f"{cell}", end=" ")
#elif cell.istitle():
# print(f"{cell}", end="")
#else:
#print("?", end="")
print()
except Exception as e:
print(f"{e}")
if __name__ == "__main__":
main("http://193.1.33.31:88/pa1/GOOGL.csv")
Upvotes: 1
Views: 87
Reputation: 142631
You should use pandas
for this. It has many powerful functions which doesn't need for
-loop.
You can read csv
directly from web page
import pandas as pd
df = pd.read_csv('http://193.1.33.31:88/pa1/GOOGL.csv')
print(df.columns)
print(df.head())
You can select one year - ie. 2018
year2018 = df[ (df['Date'] >= '2018-01-01') & (df['Date'] < '2019-01-01') ]
And calculate your value
result = (year2018['Volume'] * year2018['Adj Close']).sum() / year2018['Volume'].sum()
print(result)
EDIT: Silimar for other years
for year in range(2004, 2019):
year = str(year)
data = df[ df['Date'].str.startswith(year) ]
result = (data['Volume'] * data['Adj Close']).sum() / data['Volume'].sum()
print(year, result)
Result:
2004 80.44437157567273
2005 137.4076040074354
2006 203.03824165240846
2007 273.04059204266287
2008 227.86912213843564
2009 206.71221450434697
2010 268.65533171697064
2011 283.70689930771306
2012 322.70466840310667
2013 437.32701278816154
2014 567.9540540371448
2015 623.3613056057101
2016 757.9295821975054
2017 940.267270383813
2018 1115.287148437416
EDIT: If you keep results on list as tuples (result, year)
then you can sort them to get the best and the worst years
import pandas as pd
df = pd.read_csv('http://193.1.33.31:88/pa1/GOOGL.csv')
#df['Date'] = pandas.to_datetime(df['Date'])
#print(df.columns)
year2018 = df[ (df['Date'] >= '2018-01-01') & (df['Date'] < '2019-01-01') ]
result = (year2018['Volume'] * year2018['Adj Close']).sum() / year2018['Volume'].sum()
#print(result)
all_results = []
for year in range(2004, 2019):
year = str(year)
data = df[ df['Date'].str.startswith(year) ]
result = (data['Volume'] * data['Adj Close']).sum() / data['Volume'].sum()
all_results.append( (result, year) )
#print(year, result)
print('--- sorted by result ---')
sorted_results = sorted(all_results)
for result, year in sorted_results:
print(year, result)
Result:
--- sorted by result ---
2004 80.44437157567273
2005 137.4076040074354
2006 203.03824165240846
2009 206.71221450434697
2008 227.86912213843564
2010 268.65533171697064
2007 273.04059204266287
2011 283.70689930771306
2012 322.70466840310667
2013 437.32701278816154
2014 567.9540540371448
2015 623.3613056057101
2016 757.9295821975054
2017 940.267270383813
2018 1115.287148437416
Using slice sorted_results[:6]
you can get six the worst years, using sorted_results[-6:]
you can get six the best years. You can also use reversed()
if you want in different order.
EDIT: Almost all the same without pandas
import requests
import csv
def main(url):
r = requests.get(url)
lines = r.text.split('\n')
headers = lines[0].split(',')
data = []
for line in lines[1:]:
line = line.strip()
if line: # skip empty lines
row = line.strip().split(',')
# convert string to float/int
row[1] = float(row[1])
row[2] = float(row[2])
row[3] = float(row[3])
row[4] = float(row[4])
row[5] = float(row[5])
row[6] = int(row[6])
data.append(row)
return headers, data
if __name__ == "__main__":
headers, data = main('http://193.1.33.31:88/pa1/GOOGL.csv')
print(headers)
print('--- data ---')
print(data[0])
print(data[-1])
# get only year 2018
year2018 = []
for row in data:
if '2018-01-01' <= row[0] < '2019-01-01':
year2018.append(row)
print('--- year 2018 ---')
print(year2018[0])
print(year2018[-1])
# your calculation
a = 0
b = 0
for row in year2018:
a += row[5] * row[6]
b += row[6]
result = a/b
print(result)
Upvotes: 1