Vijayabhaskar J
Vijayabhaskar J

Reputation: 436

reverse flatten numpy array?

I have an array

[[0, 1, 0, 0] [0, 1, 0, 0] [1, 0, 0, 0] ..., [0, 1, 0, 0] [0, 1, 0, 0] [1, 0, 0, 0]] of Shape(38485,) i want to reshape to (38485,4) like

[[0, 1, 0, 0] 
[0, 1, 0, 0] 
[1, 0, 0, 0]
.
.
.
[0, 1, 0, 0]
[0, 1, 0, 0]
[1, 0, 0, 0]]

but when i try array.reshape(-1,4) it throws me the error ValueError: cannot reshape array of size 38485 into shape (4)

My code to get array:

dataset = pd.read_csv('train.csv')

y = dataset.iloc[:, 6]

fr=np.array([1,0,0,0])
re=np.array([0,1,0,0])
le=np.array([0,0,1,0])
ri=np.array([0,0,0,1])
for i in range(y.shape[0]):
    if y[i]=="Front":
        y[i]=fr
    elif y[i]=="Rear":
        y[i]=re
    elif y[i]=="Left":
        y[i]=le
    elif y[i]=="Right":
        y[i]=ri

array=y.values

Is there any way I can accomplish this?

I Fixed this by

array = np.array([[n for n in row] for row in array])

Thanks to wim

Upvotes: 3

Views: 16377

Answers (2)

Eric Duminil
Eric Duminil

Reputation: 54283

Updated answer:

The variable y is a numpy array which contained strings and numpy.arrays. Its dtype is object, so numpy doesn't understand it's a table, even though it's full of 4-element numpy.arrays at the end of the preprocessing.

You could either avoid mixing object types by using another variable than y or convert y.values with :

array = np.array([x.astype('int32') for x in y.values])

As an example:

import numpy as np
y = np.array(["left", "right"], dtype = "object")
y[0] = np.array([1,0])
y[1] = np.array([0,1])
print(y)
# [[1 0] [0 1]]
print(y.dtype)
# object
print(y.shape)
# (2,)
y = np.array([x.astype('int32') for x in y])
print(y)
# [[1 0]
#  [0 1]]
print(y.dtype)
# int32
print(y.shape)
# (2, 2)

Original answer:

Your array is somehow incomplete. It has 38485 elements, many of which look like 4-elements arrays. But somewhere in the middle, there must be at least one inner-array which doesn't have 4 elements. Or you might have a mix of collections (list, array, ).

That could be why the second value isn't defined in the shape.

Here's an example with one (8, 4) array and a copy of it, with just one element missing:

import numpy as np

data = np.array([[0, 1, 0, 0],[0, 1, 0, 0],[1, 0, 0, 0] , [0, 1, 0, 0], [0, 1, 0, 0], [0, 1, 0, 0], [0, 1, 0, 0],[1, 0, 0, 0]])
print(data.shape)
# (8, 4)
print(data.dtype)
# int64
print(set(len(sub_array) for sub_array in data))
# set([4])
print(data.reshape(-1, 4))
# [[0 1 0 0]
#  [0 1 0 0]
#  [1 0 0 0]
#  [0 1 0 0]
#  [0 1 0 0]
#  [0 1 0 0]
#  [0 1 0 0]
#  [1 0 0 0]]

broken_data = np.array([[0, 1, 0, 0],[0, 1, 0, 0],[1, 0, 0, 0] , [1, 0, 0], [0, 1, 0, 0], [0, 1, 0, 0], [0, 1, 0, 0],[1, 0, 0, 0]])
print(broken_data.shape)
# (8, )
print(broken_data.dtype)
# object
print(set(len(sub_array) for sub_array in broken_data))
# set([3, 4])
print(broken_data.reshape(-1, 4))
# [[[0, 1, 0, 0] [0, 1, 0, 0] [1, 0, 0, 0] [1, 0, 0]]
#  [[0, 1, 0, 0] [0, 1, 0, 0] [0, 1, 0, 0] [1, 0, 0, 0]]]
print([sub_array for sub_array in broken_data if len(sub_array) != 4])
# [[1, 0, 0]]

Find the sub-arrays that don't have exactly 4 elements and either filter them out or modify them.

You'll then have a (38485,4) array, and you won't have to call reshape.

Upvotes: 2

wim
wim

Reputation: 363193

The array length must be a multiple of 4. 38485 is not a multiple of 4. Otherwise, the reshape as you have written it should work correctly:

array.reshape(-1,4)

Upvotes: 1

Related Questions