Nopiforyou
Nopiforyou

Reputation: 350

Numpy Correlation Error for Python

I am trying to show correlation between two individual lists. Before installing Numpy, I parsed World Bank data for GDP values and the number of internet users and stored them in two separate lists. Here is the snippet of code. This is just for gdp07. I actually have more lists for more years and other data such as unemployment.

import numpy as np

file = open('final_gdpnum.txt', 'r')
gdp07 = []
for line in file:
    fields = line.strip().split()
    gdp07.append(fields [0])    

file2 = open('internetnum.txt', 'r')
netnum07 = []
for line in file2:
    fields2 = line.strip().split()
    nnetnum07.append(fields2 [0])

print np.correlate(gdp07,netnum07,"full")

The error I get is this:

Traceback (most recent call last):
  File "Project3,py", line 83, in ,module.
    print np.correlate(gdp07, netnum07, "full")
  File "/usr/lib/python2.6/site-packages/numpy/core/numeric.py", line 645, in correlate
    return multiarray.correlate2(a,v,mode))
ValueError: data type must provide an itemsize

Just for the record, I am using Cygwin with Python 2.6 on a Windows computer. I am only using Numpy along with its dependencies and other parts of its build (gcc compiler). Any help would be great. Thx

Upvotes: 3

Views: 4770

Answers (2)

Try to cast data to float type. it works for me!

Upvotes: 0

user1462442
user1462442

Reputation: 8202

Perhaps that is the error when you try to input data as string, since according to python docs strip() return a string

http://docs.python.org/library/stdtypes.html

Try parsing the data to whatever type you want

As you can see here

In [14]:np.correlate(["3", "2","1"], [0, 1, 0.5])
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
/home/dog/<ipython-input-14-a0b588b9af44> in <module>()
----> 1 np.correlate(["3", "2","1"], [0, 1, 0.5])

/usr/lib64/python2.7/site-packages/numpy/core/numeric.pyc in correlate(a, v, mode, old_behavior)
    643         return multiarray.correlate(a,v,mode)
    644     else:
--> 645         return multiarray.correlate2(a,v,mode)
    646 
    647 def convolve(a,v,mode='full'):

ValueError: data type must provide an itemsize

try parsing the values

In [15]: np.correlate([int("3"), int("2"),int("1")], [0, 1, 0.5])
Out[15]: array([ 2.5])



import numpy as np

file = open('final_gdpnum.txt', 'r')
gdp07 = []
for line in file:
    fields = line.strip().split()
    gdp07.append(int(fields [0]))    

file2 = open('internetnum.txt', 'r')
netnum07 = []
for line in file2:
    fields2 = line.strip().split()
    nnetnum07.append(int(fields2 [0]))

print np.correlate(gdp07,netnum07,"full")

your other error is a character ending problem i hope this works, since I dont think I can reproduce it since I have a linux box that supports utf-8 by default. I went by ipython help(codecs) documentation http://code.google.com/edu/languages/google-python-class/dict-files.html

import codecs

f =  codecs.open(file, "r", codecs.BOM_UTF8)
for line in f:
    fields = line.strip().split()
    gdp07.append(int(fields [0]))

Upvotes: 3

Related Questions