Reputation: 436
I have an array
[[0, 1, 0, 0] [0, 1, 0, 0] [1, 0, 0, 0] ..., [0, 1, 0, 0] [0, 1, 0, 0]
[1, 0, 0, 0]]
of Shape(38485,)
i want to reshape to (38485,4)
like
[[0, 1, 0, 0]
[0, 1, 0, 0]
[1, 0, 0, 0]
.
.
.
[0, 1, 0, 0]
[0, 1, 0, 0]
[1, 0, 0, 0]]
but when i try array.reshape(-1,4)
it throws me the error ValueError: cannot reshape array of size 38485 into shape (4)
My code to get array:
dataset = pd.read_csv('train.csv')
y = dataset.iloc[:, 6]
fr=np.array([1,0,0,0])
re=np.array([0,1,0,0])
le=np.array([0,0,1,0])
ri=np.array([0,0,0,1])
for i in range(y.shape[0]):
if y[i]=="Front":
y[i]=fr
elif y[i]=="Rear":
y[i]=re
elif y[i]=="Left":
y[i]=le
elif y[i]=="Right":
y[i]=ri
array=y.values
Is there any way I can accomplish this?
I Fixed this by
array = np.array([[n for n in row] for row in array])
Thanks to wim
Upvotes: 3
Views: 16377
Reputation: 54283
The variable y
is a numpy array which contained strings and numpy.array
s. Its dtype
is object
, so numpy doesn't understand it's a table, even though it's full of 4-element numpy.array
s at the end of the preprocessing.
You could either avoid mixing object types by using another variable than y
or convert y.values
with :
array = np.array([x.astype('int32') for x in y.values])
As an example:
import numpy as np
y = np.array(["left", "right"], dtype = "object")
y[0] = np.array([1,0])
y[1] = np.array([0,1])
print(y)
# [[1 0] [0 1]]
print(y.dtype)
# object
print(y.shape)
# (2,)
y = np.array([x.astype('int32') for x in y])
print(y)
# [[1 0]
# [0 1]]
print(y.dtype)
# int32
print(y.shape)
# (2, 2)
Your array
is somehow incomplete. It has 38485 elements, many of which look like 4-elements arrays. But somewhere in the middle, there must be at least one inner-array which doesn't have 4 elements. Or you might have a mix of collections (list
, array
, ).
That could be why the second value isn't defined in the shape.
Here's an example with one (8, 4)
array and a copy of it, with just one element missing:
import numpy as np
data = np.array([[0, 1, 0, 0],[0, 1, 0, 0],[1, 0, 0, 0] , [0, 1, 0, 0], [0, 1, 0, 0], [0, 1, 0, 0], [0, 1, 0, 0],[1, 0, 0, 0]])
print(data.shape)
# (8, 4)
print(data.dtype)
# int64
print(set(len(sub_array) for sub_array in data))
# set([4])
print(data.reshape(-1, 4))
# [[0 1 0 0]
# [0 1 0 0]
# [1 0 0 0]
# [0 1 0 0]
# [0 1 0 0]
# [0 1 0 0]
# [0 1 0 0]
# [1 0 0 0]]
broken_data = np.array([[0, 1, 0, 0],[0, 1, 0, 0],[1, 0, 0, 0] , [1, 0, 0], [0, 1, 0, 0], [0, 1, 0, 0], [0, 1, 0, 0],[1, 0, 0, 0]])
print(broken_data.shape)
# (8, )
print(broken_data.dtype)
# object
print(set(len(sub_array) for sub_array in broken_data))
# set([3, 4])
print(broken_data.reshape(-1, 4))
# [[[0, 1, 0, 0] [0, 1, 0, 0] [1, 0, 0, 0] [1, 0, 0]]
# [[0, 1, 0, 0] [0, 1, 0, 0] [0, 1, 0, 0] [1, 0, 0, 0]]]
print([sub_array for sub_array in broken_data if len(sub_array) != 4])
# [[1, 0, 0]]
Find the sub-arrays that don't have exactly 4 elements and either filter them out or modify them.
You'll then have a (38485,4)
array, and you won't have to call reshape
.
Upvotes: 2
Reputation: 363193
The array length must be a multiple of 4. 38485 is not a multiple of 4. Otherwise, the reshape as you have written it should work correctly:
array.reshape(-1,4)
Upvotes: 1