Reputation: 1541
so I have created this array as an example:
a = np.array([[1, 1, 1, 1, 2], [2, 2, 2, 3], [3, 3, 3, 4], [13, 49, 13, 49], [10, 10, 2, 2],
[11, 1, 1, 1, 2], [22, 2, 2, 3], [33, 3, 3, 4], [133, 49, 13, 49], [100, 10, 2, 2],
[5, 1, 1, 1, 2], [32, 2, 2, 3], [322, 3, 3, 4], [13222, 49, 13, 49], [130, 10, 2, 2]])
I wanted to create a 2d array. So for example in this case, 15*5 array.
However, when I use the a.shape
, it returns (15,)
What is wrong with my array definition?
Upvotes: 2
Views: 4065
Reputation: 51
The problem in your example is some rows have 5 elements, and the other rows have 4 elements. As a result, the numpy can not create a 15*5 array of it. It complains in your case:
Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated.
There are "some fundamental mistreatment/misinterpreation of the original data" according to Terasa. So the first thing you have to do is to find out the problem.
If it is safe, and the only goal is to make a matrix, you can try the following code. One solution is using pandas to handle missing values. Then convert the dataframe into numpy matrix.
The code is
import numpy as np
import pandas as pd
df = pd.DataFrame([[1, 1, 1, 1, 2], [2, 2, 2, 3], [3, 3, 3, 4], [13, 49, 13, 49], [10, 10, 2, 2],
[11, 1, 1, 1, 2], [22, 2, 2, 3], [33, 3, 3, 4], [133, 49, 13, 49], [100, 10, 2, 2],
[5, 1, 1, 1, 2], [32, 2, 2, 3], [322, 3, 3, 4], [13222, 49, 13, 49], [130, 10, 2, 2]])
a = df.to_numpy()
You can directly run the code to see the generated matrix.
Upvotes: 1
Reputation: 19322
Numpy arrays can only be defined when each axis has the same number of elements. Otherwise, you are left with an 1D array of objects.
This is what is happening with your array. You have a list of lists, which contains a variable number of elements (some 4 and some 5). This during conversion turns it into a (15,)
numpy arrays where the arrays has 15 separate list-objects.
a = np.array([[1, 1, 1, 1, 2], [2, 2, 2, 3], [3, 3, 3, 4], [13, 49, 13, 49]
## |______________| |__________|
## | |
## 5 length 4 length
#Variable length sublists
print(np.array([[1,2,3], [4,5]]))
#Fixed length sublists
print(np.array([[1,2,3], [4,5,6]]))
array([list([1, 2, 3]), list([4, 5])], dtype=object) #This is (2,)
array([[1, 2, 3], #This is (2,3)
[4, 5, 6]])
Upvotes: 6
Reputation: 5935
Check what your array really is:
import numpy as np
a = np.array([[1, 1, 1, 1, 2], [2, 2, 2, 3], [3, 3, 3, 4], [13, 49, 13, 49], [10, 10, 2, 2],
[11, 1, 1, 1, 2], [22, 2, 2, 3], [33, 3, 3, 4], [133, 49, 13, 49], [100, 10, 2, 2],
[5, 1, 1, 1, 2], [32, 2, 2, 3], [322, 3, 3, 4], [13222, 49, 13, 49], [130, 10, 2, 2]])
print(repr(a))
It is a an array of list objects:
array([list([1, 1, 1, 1, 2]), list([2, 2, 2, 3]), list([3, 3, 3, 4]),
list([13, 49, 13, 49]), list([10, 10, 2, 2]),
list([11, 1, 1, 1, 2]), list([22, 2, 2, 3]), list([33, 3, 3, 4]),
list([133, 49, 13, 49]), list([100, 10, 2, 2]),
list([5, 1, 1, 1, 2]), list([32, 2, 2, 3]), list([322, 3, 3, 4]),
list([13222, 49, 13, 49]), list([130, 10, 2, 2])], dtype=object)
The problem is that the number of elements in each sublist varies (some of them 4, some of them 5), so it can only be stored as a 1D array of lists. To get a 2D array, the number of elements in your sublists must be equal.
Upvotes: 4
Reputation: 114240
Take a look at a.dtype
: It's O
for object. That generally happens when you try to enter a ragged array. In this case, you can see that only elements at indices 0, 5, 10 actually have five elements. All the other sublists have four elements. If you want to have a (15, 5)
array, you need to make sure that each row has 5 elements.
Upvotes: 3