Reputation: 6244
I have pandas series. It's size is 10240. Each value in the series is a 2d array of size 143. I am making all the 2d array of size 143 into a 1d array of size 143. After that I am converting the series into a numpy array. So I should get a 2d array of size (10240*143), right? But I am not getting that. I am getting 2d array of shape (10240, ) and of size 10240. I don't know what I am doing wrong. My code is given below.
def get_subjects(x):
print(type(x)) #2d list
print(len(x)) # 2, 143
x = to_categorical(x, num_classes=len(subjects)+1).sum(axis=0)
print(type(x)) # numpy array
print(x.size) # 143
return x
print(type(train_data["subject_id"])) # pandas series
print(train_data["subject_id"].size) # 10240
subject_train = train_data["subject_id"].apply(lambda x: get_subjects(x)).to_numpy()
print(type(subject_train)) # numpy array
print(subject_train.size) # 10240
Upvotes: 0
Views: 125
Reputation: 4199
You are unable to get the expected shape because 'subject_train' is an array of arrays. To avoid it, you can split the 1d array returned by 'get_subjects' into multiple columns and then convert to numpy array like shown below.
import pandas as pd
import numpy as np
# df has 5 rows and each cell is made of 3x4 arrays
df = pd.DataFrame({'data':[np.random.randint(low =1, high =10, size=(3,4)),
np.random.randint(low =1, high =10, size=(3,4)),
np.random.randint(low =1, high =10, size=(3,4)),
np.random.randint(low =1, high =10, size=(3,4)),
np.random.randint(low =1, high =10, size=(3,4)),
]})
def get_subjects(x):
#substitute to x = to_categorical(x, num_classes=len(subjects)+1).sum(axis=0)
x = x.reshape(-1) # this one reshapes 3x4 array to 1x12
return x
# apply(pd.series) splits the each row made of 1x12 array to 12 seperate columns
df["data"].apply(lambda x: get_subjects(x)).apply(pd.Series).to_numpy().shape
results in
5,12
Upvotes: 1