odbhut.shei.chhele
odbhut.shei.chhele

Reputation: 6244

Cannot convert a pandas series into a 2d array?

I have pandas series. It's size is 10240. Each value in the series is a 2d array of size 143. I am making all the 2d array of size 143 into a 1d array of size 143. After that I am converting the series into a numpy array. So I should get a 2d array of size (10240*143), right? But I am not getting that. I am getting 2d array of shape (10240, ) and of size 10240. I don't know what I am doing wrong. My code is given below.

def get_subjects(x):
  print(type(x)) #2d list
  print(len(x)) # 2, 143
  x = to_categorical(x, num_classes=len(subjects)+1).sum(axis=0)
  print(type(x)) # numpy array
  print(x.size) # 143
  return x

print(type(train_data["subject_id"])) # pandas series
print(train_data["subject_id"].size) # 10240
subject_train = train_data["subject_id"].apply(lambda x: get_subjects(x)).to_numpy()
print(type(subject_train)) # numpy array
print(subject_train.size) # 10240 

Upvotes: 0

Views: 125

Answers (1)

plasmon360
plasmon360

Reputation: 4199

You are unable to get the expected shape because 'subject_train' is an array of arrays. To avoid it, you can split the 1d array returned by 'get_subjects' into multiple columns and then convert to numpy array like shown below.

import pandas as pd
import numpy as np
# df has 5 rows and each cell is made of 3x4 arrays 
df = pd.DataFrame({'data':[np.random.randint(low =1, high =10, size=(3,4)),
                           np.random.randint(low =1, high =10, size=(3,4)),
                           np.random.randint(low =1, high =10, size=(3,4)),
                           np.random.randint(low =1, high =10, size=(3,4)),
                           np.random.randint(low =1, high =10, size=(3,4)),
                          ]})

def get_subjects(x):
  #substitute to x = to_categorical(x, num_classes=len(subjects)+1).sum(axis=0)
  x = x.reshape(-1) # this one reshapes 3x4 array to 1x12
  return x

# apply(pd.series) splits the each row made of 1x12 array to 12 seperate columns
df["data"].apply(lambda x: get_subjects(x)).apply(pd.Series).to_numpy().shape

results in

5,12

Upvotes: 1

Related Questions