Reputation: 181
I am trying to plot a graph from a CSV file with the following Python code;
import csv
import matplotlib.pyplot as plt
def population_dict(filename):
"""
Reads the population from a CSV file, containing
years in column 2 and population / 1000 in column 3.
@param filename: the filename to read the data from
@return dictionary containing year -> population
"""
dictionary = {}
with open(filename, 'r') as f:
reader = csv.reader(f)
f.next()
for row in reader:
dictionary[row[2]] = row[3]
return dictionary
dict_for_plot = population_dict('population.csv')
def plot_dict(dict_for_plot):
x_list = []
y_list = []
for data in dict_for_plot:
x = data
y = dict_for_plot[data]
x_list.append(x)
y_list.append(y)
plt.plot(x_list, y_list, 'ro')
plt.ylabel('population')
plt.xlabel('year')
plt.show()
plot_dict(dict_for_plot)
def grow_rate(data_dict):
# fill lists
growth_rates = []
x_list = []
y_list = []
for data in data_dict:
x = data
y = data_dict[data]
x_list.append(x)
y_list.append(y)
# calc grow_rate
for i in range(0, len(y_list)-1):
var = float(y_list[i+1]) - float(y_list[i])
var = var/y_list[i]
print var
growth_rates.append(var)
# growth_rate_dict = dict(zip(years, growth_rates))
grow_rate(dict_for_plot)
However, I'm getting a rather weird error on executing this code
Traceback (most recent call last):
File "/home/jharvard/Desktop/pyplot.py", line 71, in <module>
grow_rate(dict_for_plot)
File "/home/jharvard/Desktop/pyplot.py", line 64, in grow_rate
var = var/y_list[i]
TypeError: unsupported operand type(s) for /: 'float' and 'str'
I've been trying different methods to cast the y_list
variable. For example; casting an int.
How can I solve this problem so I can get the percentage of the grow rate through the years to plot this.
Upvotes: 0
Views: 153
Reputation: 5373
Since CSV files are text files, you will need to convert them into numbers. Its easy to correct for the syntax error. Just use
var/float(y_list[i])
Even though that gets rid of the syntax error, there is a minor bug which is a little more difficult to spot, which may result in incorrect results under some circumstances. The main reason being that dictionaries are not ordered. i.e. the x and y values are not ordered in any way. The indentation for your program appears to be a bit off on my computer, so am unable to follow it exactly. But the gist of it appears to be that you are obtaining values from a file (x, and y values) and then finding the sequence
var[i] = (y[i+1] - y[i]) / y[i]
Unfortunately, your y_list[i]
may not be in the same sequence as in the CSV file because, it is being populated from a dictionary.
In the section where you did:
for row in reader:
dictionary[row[2]] = row[3]
it is just better to preserve the order by doing
x, y = zip(*[ ( float(row[2]), float(row[3]) ) for row in reader])
x, y = map(numpy.array, [x, y])
return x, y
or something like this ...
Then, Numpy arrays have methods for handling your problem much more efficiently. You can then simply do:
growth_rates = numpy.diff(y) / y[:-1]
Hope this helps. Let me know if you have any questions.
Finally, if you do go the Numpy route, I would highly recommend its own csv reader. Check it out here: http://docs.scipy.org/doc/numpy/user/basics.io.genfromtxt.html
Upvotes: 1