Reputation: 191
I am trying to calculate the below and getting an error when the arrays are not similar size. I know I can do this manually for different sized arrays, but can you help me correct this code.
import scipy
from scipy.stats import pearsonr, spearmanr
from scipy.spatial import distance
x = [5,3,2.5]
y = [4,3,2,4,3.5,4]
pearsonr(x,y)
:Error
scipy.spatial.distance.euclidean(x, y)
:Error
spearmanr(x,y)
:Error
scipy.spatial.distance.jaccard(x, y)
:Error
Upvotes: 1
Views: 827
Reputation: 152850
For the distance the arrays must be of dimension 2, even if each subarray just contains one element, for example:
def make2d(lst):
return [[i] for i in lst]
>>> scipy.spatial.distance.cdist(make2d([5,3,2.5]), make2d([4,3,2,4,3.5,4]))
array([[ 1. , 2. , 3. , 1. , 1.5, 1. ],
[ 1. , 0. , 1. , 1. , 0.5, 1. ],
[ 1.5, 0.5, 0.5, 1.5, 1. , 1.5]])
You can choose a different metric (like jaccard
):
>>> scipy.spatial.distance.cdist(make2d([5,3,2.5]), make2d([4,3,2,4,3.5,4]), metric='jaccard')
array([[ 1., 1., 1., 1., 1., 1.],
[ 1., 0., 1., 1., 1., 1.],
[ 1., 1., 1., 1., 1., 1.]])
But for the statistics functions I have no idea how you want that to work, these sort-of require same-length arrays by definition. You may need to consult the documentation of these.
Upvotes: 1