Calculating and plotting a grow rate in years from a dictionary

Question

I am trying to plot a graph from a CSV file with the following Python code;

import csv
import matplotlib.pyplot as plt

def population_dict(filename):
   """
   Reads the population from a CSV file, containing 
   years in column 2 and population / 1000 in column 3.

   @param filename: the filename to read the data from
   @return dictionary containing year -> population
   """
   dictionary = {}
   with open(filename, 'r') as f:
       reader = csv.reader(f)
       f.next()
       for row in reader:
           dictionary[row[2]] = row[3]
           return dictionary

           dict_for_plot = population_dict('population.csv')

           def plot_dict(dict_for_plot):

               x_list = []
               y_list = []
               for data in dict_for_plot:
                   x = data
                   y = dict_for_plot[data]
                   x_list.append(x)
                   y_list.append(y)
                   plt.plot(x_list, y_list, 'ro')
                   plt.ylabel('population')
                   plt.xlabel('year')
                   plt.show()

                   plot_dict(dict_for_plot)

                   def grow_rate(data_dict):
 # fill lists
 growth_rates = []
 x_list = []
 y_list = []
 for data in data_dict:
   x = data
   y = data_dict[data]
   x_list.append(x)
   y_list.append(y)

 # calc grow_rate
 for i in range(0, len(y_list)-1):
   var = float(y_list[i+1]) - float(y_list[i])
   var = var/y_list[i]
   print var
   growth_rates.append(var)

 # growth_rate_dict = dict(zip(years, growth_rates))


 grow_rate(dict_for_plot)

However, I'm getting a rather weird error on executing this code

Traceback (most recent call last):
 File "/home/jharvard/Desktop/pyplot.py", line 71, in 
 grow_rate(dict_for_plot)
 File "/home/jharvard/Desktop/pyplot.py", line 64, in grow_rate
 var = var/y_list[i]
TypeError: unsupported operand type(s) for /: 'float' and 'str'

I've been trying different methods to cast the y_list variable. For example; casting an int.

How can I solve this problem so I can get the percentage of the grow rate through the years to plot this.

ssm · Accepted Answer

Since CSV files are text files, you will need to convert them into numbers. Its easy to correct for the syntax error. Just use

var/float(y_list[i])

Even though that gets rid of the syntax error, there is a minor bug which is a little more difficult to spot, which may result in incorrect results under some circumstances. The main reason being that dictionaries are not ordered. i.e. the x and y values are not ordered in any way. The indentation for your program appears to be a bit off on my computer, so am unable to follow it exactly. But the gist of it appears to be that you are obtaining values from a file (x, and y values) and then finding the sequence

var[i] = (y[i+1] - y[i]) / y[i]

Unfortunately, your y_list[i] may not be in the same sequence as in the CSV file because, it is being populated from a dictionary.

In the section where you did:

   for row in reader:
       dictionary[row[2]] = row[3]

it is just better to preserve the order by doing

x, y = zip(*[ ( float(row[2]), float(row[3]) )  for row in reader])
x, y = map(numpy.array, [x, y])
return x, y

or something like this ...

Then, Numpy arrays have methods for handling your problem much more efficiently. You can then simply do:

growth_rates = numpy.diff(y) / y[:-1]

Hope this helps. Let me know if you have any questions.

Finally, if you do go the Numpy route, I would highly recommend its own csv reader. Check it out here: http://docs.scipy.org/doc/numpy/user/basics.io.genfromtxt.html

Calculating and plotting a grow rate in years from a dictionary

Answers (1)

Related Questions