Reputation: 245
I have a data file with a data in some specific format and has some extra lines to ignore while processing. I need to process the data and calculate a value based on the same.
Sample Data:
Average monthly temperatures in Dubuque, Iowa,
January 1964 through december 1975, n=144
24.7 25.7 30.6 47.5 62.9 68.5 73.7 67.9 61.1 48.5 39.6 20.0
16.1 19.1 24.2 45.4 61.3 66.5 72.1 68.4 60.2 50.9 37.4 31.1
10.4 21.6 37.4 44.7 53.2 68.0 73.7 68.2 60.7 50.2 37.2 24.6
21.5 14.7 35.0 48.3 54.0 68.2 69.6 65.7 60.8 49.1 33.2 26.0
19.1 20.6 40.2 50.0 55.3 67.7 70.7 70.3 60.6 50.7 35.8 20.7
14.0 24.1 29.4 46.6 58.6 62.2 72.1 71.7 61.9 47.6 34.2 20.4
8.4 19.0 31.4 48.7 61.6 68.1 72.2 70.6 62.5 52.7 36.7 23.8
11.2 20.0 29.6 47.7 55.8 73.2 68.0 67.1 64.9 57.1 37.6 27.7
13.4 17.2 30.8 43.7 62.3 66.4 70.2 71.6 62.1 46.0 32.7 17.3
22.5 25.7 42.3 45.2 55.5 68.9 72.3 72.3 62.5 55.6 38.0 20.4
17.6 20.5 34.2 49.2 54.8 63.8 74.0 67.1 57.7 50.8 36.8 25.5
20.4 19.6 24.6 41.3 61.8 68.5 72.0 71.1 57.3 52.5 40.6 26.2
Source of Sample File: http://robjhyndman.com/tsdldata/data/cryer2.dat
Note: Here, rows represent the years and columns represent the months.
I am trying to write a function which returns the average temperature of any month from the given url.
I have tried as below:
def avg_temp_march(f):
march_temps = []
# read each line of the file and store the values
# as floats in a list
for line in f:
line = str(line, 'ascii') # now line is a string
temps = line.split()
# check that it is not empty.
if temps != []:
march_temps.append(float(temps[2]))
# calculate the average and return it
return sum(march_temps) / len(march_temps)
avg_temp_march("data5.txt")
But I am getting the error line = str(line, 'ascii')
TypeError: decoding str is not supported
Upvotes: 1
Views: 8492
Reputation: 85442
Using pandas, the code becomes bit shorter:
import calendar
import pandas a spd
df = pd.read_csv('data5.txt', delim_whitespace=True, skiprows=2,
names=calendar.month_abbr[1:])
Now for March:
>>> df.Mar.mean()
32.475000000000001
and for all months:
>>> df.mean()
Jan 16.608333
Feb 20.650000
Mar 32.475000
Apr 46.525000
May 58.091667
Jun 67.500000
Jul 71.716667
Aug 69.333333
Sep 61.025000
Oct 50.975000
Nov 36.650000
Dec 23.641667
dtype: float64
Upvotes: 0
Reputation: 2155
I think there is no requirement for converting a string to string.
I tried to fix your code with some modifications:
def avg_temp_march(f):
# f is a string read from file
march_temps = []
for line in f.split("\n"):
if line == "": continue
temps = line.split(" ")
temps = [t for t in temps if t != ""]
# check that it is not empty.
month_index = 2
if len(temps) > month_index:
try:
march_temps.append(float(temps[month_index]))
except Exception, e:
print temps
print "Skipping line:", e
# calculate the average and return it
return sum(march_temps) / len(march_temps)
Output:
['Average', 'monthly', 'temperatures', 'in', 'Dubuque,', 'Iowa,']
Skipping line: could not convert string to float: temperatures
['January', '1964', 'through', 'december', '1975,', 'n=144']
Skipping line: could not convert string to float: through
32.475
Based on your original question (before latest edits), I think you can solve your problem in this way.
# from urllib2 import urlopen
from urllib.request import urlopen #python3
def avg_temp_march(url):
f = urlopen(url).read()
data = f.split("\n")[3:] #ingore the first 3 lines
data = [line.split() for line in data if line!=''] #ignore the empty lines
data = [map(float, line) for line in data] #Convert all numbers to float
month_index = 2 # 2 for march
monthly_sum = sum([line[month_index] for line in data])
monthly_avg = monthly_sum/len(data)
return monthly_avg
print avg_temp_march("http://robjhyndman.com/tsdldata/data/cryer2.dat")
Upvotes: 3