Reputation: 350
I am trying to show correlation between two individual lists. Before installing Numpy, I parsed World Bank data for GDP values and the number of internet users and stored them in two separate lists. Here is the snippet of code. This is just for gdp07. I actually have more lists for more years and other data such as unemployment.
import numpy as np
file = open('final_gdpnum.txt', 'r')
gdp07 = []
for line in file:
fields = line.strip().split()
gdp07.append(fields [0])
file2 = open('internetnum.txt', 'r')
netnum07 = []
for line in file2:
fields2 = line.strip().split()
nnetnum07.append(fields2 [0])
print np.correlate(gdp07,netnum07,"full")
The error I get is this:
Traceback (most recent call last):
File "Project3,py", line 83, in ,module.
print np.correlate(gdp07, netnum07, "full")
File "/usr/lib/python2.6/site-packages/numpy/core/numeric.py", line 645, in correlate
return multiarray.correlate2(a,v,mode))
ValueError: data type must provide an itemsize
Just for the record, I am using Cygwin with Python 2.6 on a Windows computer. I am only using Numpy along with its dependencies and other parts of its build (gcc compiler). Any help would be great. Thx
Upvotes: 3
Views: 4770
Reputation: 1
Try to cast data to float type. it works for me!
Upvotes: 0
Reputation: 8202
Perhaps that is the error when you try to input data as string, since according to python docs strip() return a string
http://docs.python.org/library/stdtypes.html
Try parsing the data to whatever type you want
As you can see here
In [14]:np.correlate(["3", "2","1"], [0, 1, 0.5])
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
/home/dog/<ipython-input-14-a0b588b9af44> in <module>()
----> 1 np.correlate(["3", "2","1"], [0, 1, 0.5])
/usr/lib64/python2.7/site-packages/numpy/core/numeric.pyc in correlate(a, v, mode, old_behavior)
643 return multiarray.correlate(a,v,mode)
644 else:
--> 645 return multiarray.correlate2(a,v,mode)
646
647 def convolve(a,v,mode='full'):
ValueError: data type must provide an itemsize
try parsing the values
In [15]: np.correlate([int("3"), int("2"),int("1")], [0, 1, 0.5])
Out[15]: array([ 2.5])
import numpy as np
file = open('final_gdpnum.txt', 'r')
gdp07 = []
for line in file:
fields = line.strip().split()
gdp07.append(int(fields [0]))
file2 = open('internetnum.txt', 'r')
netnum07 = []
for line in file2:
fields2 = line.strip().split()
nnetnum07.append(int(fields2 [0]))
print np.correlate(gdp07,netnum07,"full")
your other error is a character ending problem i hope this works, since I dont think I can reproduce it since I have a linux box that supports utf-8 by default. I went by ipython help(codecs) documentation http://code.google.com/edu/languages/google-python-class/dict-files.html
import codecs
f = codecs.open(file, "r", codecs.BOM_UTF8)
for line in f:
fields = line.strip().split()
gdp07.append(int(fields [0]))
Upvotes: 3