Reputation: 313
I have a text file which has the following abbreviated list of 365 entries, each on a single line. The first entry represents a date, and the second a value for the Dow Jones
8/28/2018|26064.01953
8/29/2018|26124.57031
8/30/2018|25986.91992
I am using the following code:
import os
import math
import statistics
def main ():
infile = open('DJI.txt', 'r')
values = infile.read()
infile.close()
values=values.split("\n")
values=[value.split("|")for value in values]
avg = sum([float(l[1]) for l in values])/len(values)
highest = max([float(l[1]) for l in values])
lowest = min([float(l[1]) for l in values])
values.sort(key = lambda x:x[1])
print(avg)
print(highest)
print(lowest)
print(values)
main()
I am struggling with 2 more tasks on this code, first is to find the Average close value per month, rather than the average value for the whole year.
The second is that for the highest and lowest function, the date which the value occurred should also be displayed with the value.
Your help is greatly appreciated.
Upvotes: 2
Views: 58
Reputation: 23825
The solution below is not using any external library
from collections import defaultdict
monthly_data = defaultdict(list)
with open('DJI.txt') as f:
lines = [l.strip() for l in f.readlines()]
for line in lines:
values = line.split('|')
date = values[0]
month = date.split('/')[0]
value = float(values[1])
monthly_data[month].append((value,date))
for month,values in monthly_data.items():
_values = [v[0] for v in values]
avg = sum(_values)/len(_values)
_min = min(values, key=lambda x: x[0])
_max = max(values, key=lambda x: x[0])
print('Month: {}. avg value {}, min value {}, max value {}'.format(month,avg,_min,_max))
DJI.txt
8/28/2018|26064.01953
8/29/2018|26124.57031
8/30/2018|25986.91992
9/28/2018|16064.01953
9/10/2018|12.99
9/29/2018|16124.57031
9/30/2018|15986.91992
9/12/2018|999999.91992
output
Month: 8. avg value 26058.503253333332, min value (25986.91992, '8/30/2018'), max value (26124.57031, '8/29/2018')
Month: 9. avg value 209637.68393600002, min value (12.99, '9/10/2018'), max value (999999.91992, '9/12/2018')
Upvotes: 1
Reputation: 14546
Using pandas, this functionality can be achieved fairly easily:
My input file: (note extra month data to check monthly averages)
8/28/2018|26064.01953
8/29/2018|26124.57031
8/30/2018|25986.91992
9/28/2018|26064.01953
9/29/2018|25124.57031
9/30/2018|25986.91992
Reading the input file:
>>> import pandas as pd
>>> df = pd.read_csv("input.txt", '|', header=None, names=["Date", "Dow-Jones Value"], parse_dates=["Date"])
>>> df
Date Dow-Jones Value
0 2018-08-28 26064.01953
1 2018-08-29 26124.57031
2 2018-08-30 25986.91992
3 2018-09-28 26064.01953
4 2018-09-29 25124.57031
5 2018-09-30 25986.91992
Retrieving statistics:
>>> df['Dow-Jones Value'].mean() # average
25891.836586666668
>>> df.iloc[df['Dow-Jones Value'].idxmax()] # highest
Date 2018-08-29 00:00:00
Dow-Jones Value 26124.6
Name: 1, dtype: object
>>> df.iloc[df['Dow-Jones Value'].idxmin()] # lowest
Date 2018-09-29 00:00:00
Dow-Jones Value 25124.6
Name: 4, dtype: object
>>> df.sort_values('Dow-Jones Value') # sorted by Dow-Jones Value
Date Dow-Jones Value
4 2018-09-29 25124.57031
2 2018-08-30 25986.91992
5 2018-09-30 25986.91992
0 2018-08-28 26064.01953
3 2018-09-28 26064.01953
1 2018-08-29 26124.57031
>>> df.groupby(pd.Grouper(key='Date', freq='M')).mean() # Monthly Averages
Dow-Jones Value
Date
2018-08-31 26058.503253
2018-09-30 25725.169920
Upvotes: 1