sdkayb
sdkayb

Reputation: 146

how to convert a panda dataframe column containing string object to a numpy array?

please i'am working on a project and i have to do some data preprocessing i have a dataframe that looks like this (this is just an example for simplification

index | pixels 
0     | 10 20 30 40 
1     | 11 12 13 14

and I want to convert it to a np array of shape (2,2,2,1) the type of the pixels column is object is there any solution to do that without loops cause I have a 28k rows data frame with big images ? i have tried looping but it takes so long to execute on my machine

Upvotes: 0

Views: 1295

Answers (1)

Henry Ecker
Henry Ecker

Reputation: 35686

Use str.split + astype + to_numpy + reshape:

a = (
    df['pixels'].str.split(' ', expand=True)
        .astype(int).to_numpy()
        .reshape((2, 2, 2, 1))
)

a:

[[[[10]
   [20]]

  [[30]
   [40]]]


 [[[11]
   [12]]

  [[13]
   [14]]]]

Complete Working Example:

import pandas as pd

df = pd.DataFrame({'pixels': ['10 20 30 40', '11 12 13 14']})

a = (
    df['pixels'].str.split(' ', expand=True)
        .astype(int).to_numpy()
        .reshape((2, 2, 2, 1))
)
print(a)

Upvotes: 3

Related Questions