Kadaj13
Kadaj13

Reputation: 1541

Numpy's "shape" function returns a 1D value for a 2D array

so I have created this array as an example:

a = np.array([[1, 1, 1, 1, 2], [2, 2, 2, 3], [3, 3, 3, 4], [13, 49, 13, 49], [10, 10, 2, 2],
             [11, 1, 1, 1, 2], [22, 2, 2, 3], [33, 3, 3, 4], [133, 49, 13, 49], [100, 10, 2, 2],
             [5, 1, 1, 1, 2], [32, 2, 2, 3], [322, 3, 3, 4], [13222, 49, 13, 49], [130, 10, 2, 2]])

I wanted to create a 2d array. So for example in this case, 15*5 array.

However, when I use the a.shape, it returns (15,)

What is wrong with my array definition?

Upvotes: 2

Views: 4065

Answers (4)

Chuncheng
Chuncheng

Reputation: 51

The problem in your example is some rows have 5 elements, and the other rows have 4 elements. As a result, the numpy can not create a 15*5 array of it. It complains in your case:

Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated.

There are "some fundamental mistreatment/misinterpreation of the original data" according to Terasa. So the first thing you have to do is to find out the problem.

If it is safe, and the only goal is to make a matrix, you can try the following code. One solution is using pandas to handle missing values. Then convert the dataframe into numpy matrix.

The code is

import numpy as np
import pandas as pd

df = pd.DataFrame([[1, 1, 1, 1, 2], [2, 2, 2, 3], [3, 3, 3, 4], [13, 49, 13, 49], [10, 10, 2, 2],
             [11, 1, 1, 1, 2], [22, 2, 2, 3], [33, 3, 3, 4], [133, 49, 13, 49], [100, 10, 2, 2],
             [5, 1, 1, 1, 2], [32, 2, 2, 3], [322, 3, 3, 4], [13222, 49, 13, 49], [130, 10, 2, 2]])

a = df.to_numpy()

You can directly run the code to see the generated matrix.

Upvotes: 1

Akshay Sehgal
Akshay Sehgal

Reputation: 19322

Tl;dr. Your individual lists are of variable length thus forcing your NumPy array to be a 1D array of list objects rather than a 2D array of integers/floats

Numpy arrays can only be defined when each axis has the same number of elements. Otherwise, you are left with an 1D array of objects.

enter image description here

This is what is happening with your array. You have a list of lists, which contains a variable number of elements (some 4 and some 5). This during conversion turns it into a (15,) numpy arrays where the arrays has 15 separate list-objects.

a = np.array([[1, 1, 1, 1, 2], [2, 2, 2, 3], [3, 3, 3, 4], [13, 49, 13, 49]
##           |______________|  |__________|
##                   |               |
##            5 length            4 length

A quick demonstration -

#Variable length sublists
print(np.array([[1,2,3], [4,5]]))

#Fixed length sublists
print(np.array([[1,2,3], [4,5,6]]))
array([list([1, 2, 3]), list([4, 5])], dtype=object)  #This is (2,)

array([[1, 2, 3],                    #This is (2,3)
       [4, 5, 6]])
  1. What you might want to do is either fix the number of elements in each sublist.
  2. Or you may want to do some padding on your array.

Upvotes: 6

Jan Christoph Terasa
Jan Christoph Terasa

Reputation: 5935

Check what your array really is:

import numpy as np

a = np.array([[1, 1, 1, 1, 2], [2, 2, 2, 3], [3, 3, 3, 4], [13, 49, 13, 49], [10, 10, 2, 2],
             [11, 1, 1, 1, 2], [22, 2, 2, 3], [33, 3, 3, 4], [133, 49, 13, 49], [100, 10, 2, 2],
             [5, 1, 1, 1, 2], [32, 2, 2, 3], [322, 3, 3, 4], [13222, 49, 13, 49], [130, 10, 2, 2]])

print(repr(a))

It is a an array of list objects:

array([list([1, 1, 1, 1, 2]), list([2, 2, 2, 3]), list([3, 3, 3, 4]),
       list([13, 49, 13, 49]), list([10, 10, 2, 2]),
       list([11, 1, 1, 1, 2]), list([22, 2, 2, 3]), list([33, 3, 3, 4]),
       list([133, 49, 13, 49]), list([100, 10, 2, 2]),
       list([5, 1, 1, 1, 2]), list([32, 2, 2, 3]), list([322, 3, 3, 4]),
       list([13222, 49, 13, 49]), list([130, 10, 2, 2])], dtype=object)

The problem is that the number of elements in each sublist varies (some of them 4, some of them 5), so it can only be stored as a 1D array of lists. To get a 2D array, the number of elements in your sublists must be equal.

Upvotes: 4

Mad Physicist
Mad Physicist

Reputation: 114240

Take a look at a.dtype: It's O for object. That generally happens when you try to enter a ragged array. In this case, you can see that only elements at indices 0, 5, 10 actually have five elements. All the other sublists have four elements. If you want to have a (15, 5) array, you need to make sure that each row has 5 elements.

Upvotes: 3

Related Questions