Meranda Fairchild
Meranda Fairchild

Reputation: 21

Python: Finding the average stock value for each month

Basically I have a list of tuples that include a data and price, something like:

[ ("2013-02-12", 200.0), ("2012-02-25", 300.0), ("2000-03-04", 100.0), ("2000-03-05", 50.0)]

The function needs to find the average stock value for each month, then return a list of tuples including the date (month and year) and the stock price. Something like:

[(250.0, "02-2013"), (100.0, "03-2000"), (50.0, "03-2000")]

Here is the code I have so far:

def average_data(list_of_tuples = []):

    list_of_averages = []
    current_year_int = 2013
    current_month_int = 2
    sum_float = float()
    count = 0
    for dd_tuple in list_of_tuples:
        date_str = dd_tuple[0]
        data_float = dd_tuple[1]
        date_list = date_str.split("-")
        year_int = int(date_list[0])
        month_int = int(date_list[1])
        date_year_str = "Date: " + str(month_int) + "-" + str(year_int);


        if month_int != current_month_int:
            average_float = sum_float / count
            average_list = [date_year_str, average_float]
            average_tuple = tuple(average_list)
            list_of_averages.append(average_tuple)
            current_month_int = month_int
            sum_float += data_float


        sum_float += data_float
        count += 1
        current_month_int = month_int
        current_year_int = year_int


    return list_of_averages

It returns an average, but not the right ones, and perhaps not all of them? I have tried looking at examples on the internet and asking my TA (this is for a python class) but to no avail. Could someone point me in the right direction?

Edit: Based on the suggestion, the if statement should now look like this, correct?

    if month_int != current_month_int:
        average_float = sum_float / count
        average_list = [date_year_str, average_float]
        average_tuple = tuple(average_list)
        list_of_averages.append(average_tuple)
        current_month_int = month_int
        sum_float = 0.0
        count = 0
        sum_float += data_float
        count += 1

Edit: Thanks for the help everyone! I've got the code running now.

Upvotes: 2

Views: 4596

Answers (4)

Ashwini Chaudhary
Ashwini Chaudhary

Reputation: 251196

>>> lis = [ ("2013-02-12", 200.0), ("2012-02-25", 300.0), ("2000-03-04", 100.0), ("2000-03-05", 50.0)]
>>> from collections import defaultdict
>>> dic = defaultdict(list)
>>> for k,val in lis:
        key = "-".join(k.split('-')[:-1][::-1])             
        dic[key].append(val)
...     
>>> [(sum(v)/float(len(v)),k)  for k,v in dic.items()]

[(200.0, '02-2013'), (300.0, '02-2012'), (75.0, '03-2000')]

A simpler version of the above code:

lis = [ ("2013-02-12", 200.0), ("2012-02-25", 300.0), ("2000-03-04", 100.0), ("2000-03-05", 50.0)]
dic = {}
for date, val in lis:
    #split the date string at '-' and assign the first  2 items to  year,month
    year, month = date.split('-')[:2]
    #now check if (month,year) is there in the dict
    if (month, year) not in dic:
        #if the tuple was not found then initialise one with an empty list
        dic[month,year] = []

    dic[month,year].append(val) # append val to the (month,year) key

print dic
#Now iterate over key,value items and do some calculations to get the desired output
sol =[]
for key, val in dic.items():
    new_key = "-".join(key)
    avg = sum(val) / len(val)
    sol.append((avg, new_key))
print sol

output:

#print dic
{('03', '2000'): [100.0, 50.0],
 ('02', '2013'): [200.0],
 ('02', '2012'): [300.0]}
#print sol
[(75.0, '03-2000'), (200.0, '02-2013'), (300.0, '02-2012')]

Upvotes: 2

Jeff Tratner
Jeff Tratner

Reputation: 17126

I'm never sure with homework problems, but how about I get you part of the way there by using a dict. I've tried to keep the example simple so it's easy to understand what's going on.

monthly_prices = {}
for dd_tuple in list_of_tuples:
    date, price = dd_tuple
    year, month, _ = date.split("-")
    # this will be a list
    curr_prices = monthly_prices.setdefault((year, month), [])
    curr_prices.append(price)

This gets you a mapping of (year, month) tuples to a list of prices. Try going from there.

setdefault checks whether a key already exists in the mapping and if it doesn't, sets the key to have the default value. (a defaultdict is in essence some nice syntactic sugar around this and avoids having to initialize a list on every iteration).

Upvotes: 1

Chris Doggett
Chris Doggett

Reputation: 20787

Let's add a duplicate date to your example, so we can actually see some averaging:

l = [ ("2013-02-12", 200.0), ("2012-02-25", 300.0), ("2000-03-04", 100.0), ("2000-03-05", 50.0), ("2013-02-12", 100.0)]

"2013-02-12" shows up twice, totalling 300.0, so should average out to 150.0

I don't know if you've learned about dictionaries or better yet, defaultdict, but that's what I'm using. With defaultdict, you can specify in the constructor what should be returned if the key isn't found:

from collections import defaultdict

d = default_dict(float) # we'll use this to keep a running sum per date
d_count = default_dict(int) # this one will keep track of how many times the date shows up

We could also use a collections.Counter to keep count, but we'd have to iterate over the list an extra time, which isn't great for speed with a huge list.

Now you'll want to go over the list, and add the values to the dictionary using the date as the key:

for k,v in l:
    d[k] += v # add the value
    d_count[k] += 1 # increment the count

So you should now have two dictionaries, that look like this:

>>> d
defaultdict(<type 'float'>, {'2013-02-12': 300.0, '2012-02-25': 300.0, '2000-03-05': 50.0, '2000-03-04': 100.0})

>>> d_count
defaultdict(<type 'int'>, {'2013-02-12': 2, '2012-02-25': 1, '2000-03-05': 1, '2000-03-04': 1})

Now, since both dictionaries have the same keys, you can iterate over the items in the dictionary, and divide the value for a date by the count for that date, to give you the average by date.

for k,v in d.iteritems():
    d[k] /= d_count[k]

"d" should now contain your final averages by date:

>>> d
defaultdict(<type 'float'>, {'2013-02-12': 150.0, '2012-02-25': 300.0, '2000-03-05': 50.0, '2000-03-04': 100.0})

>>> d['2013-02-12']
150.0

>>> for k,v in d.iteritems():
print k, v

2013-02-12 150.0
2012-02-25 300.0
2000-03-05 50.0
2000-03-04 100.0

Upvotes: 1

Anantha Krishnan
Anantha Krishnan

Reputation: 541

Inside the if loop the sum_float and count are not made a 0, so as a program proceeds the average goes for multiple months. So try doing that and it should solve your problem. Also one more point with your logic is that are you that whether your list of tuples is sorted one if not it can lead to complications with regards to your logic.

Upvotes: 0

Related Questions