lordy
lordy

Reputation: 630

What is the difference between a numpy array of size (100, 1) and (100,)?

I have two variables coming from diffrent functions and the first one a is:

<class 'numpy.ndarray'>
(100,)

while the other one b is:

<class 'numpy.ndarray'>
(100, 1)

If I try to correlate them via:

from scipy.stats import pearsonr
p, r= pearsonr(a, b)

I get:

    r = max(min(r, 1.0), -1.0)
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

My questions are:

  1. What is the difference between a and b?
  2. How do I fix this?

Upvotes: 3

Views: 5519

Answers (3)

mattsap
mattsap

Reputation: 3806

You'll need to call the reshape function on the first one to .reshape((100,1)) Reshape will change the "shape" property of the np array which will make the 1D array [1,2,3, ..., 100] to a 2D array [[1],[2],[3],...[100]]

Upvotes: 0

NVS Abhilash
NVS Abhilash

Reputation: 577

First question's answer: a is a vector, and b is a matrix. Look at this stackoverflow link for more details: Difference between numpy.array shape (R, 1) and (R,)

Second question's answer:

I think converting one to the other form should just work fine. For the function you provided, I guess it expects vectors, hence just reshape b using b = b.reshape(-1) which converts it to a single dimensions (a vector). Look at the below example for reference:

>>> import numpy as np
>>> from scipy.stats import pearsonr
>>> a = np.random.random((100,))
>>> b = np.random.random((100,1))
>>> print(a.shape, b.shape)
(100,) (100, 1)
>>> p, r= pearsonr(a, b)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\xyz\Appdata\Local\Continuum\Anaconda3\lib\site-packages\scipy\stats\stats.py", line 3042, in pearsonr
    r = max(min(r, 1.0), -1.0)
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
>>> b = b.reshape(-1)
>>> p, r= pearsonr(a, b)
>>> print(p, r)
0.10899671932026986 0.280372238354364

Upvotes: 3

user8426627
user8426627

Reputation: 943

(100,1) is 2d array of rows of length 1 like = [[1],[2],[3],[4]] and second one is 1d array [1, 2, 3, 4 ]

a1 = np.array([[1],[2],[3],[4]])
a2 = np.array([1, 2, 3, 4 ])

Upvotes: 4

Related Questions