pramod
pramod

Reputation: 173

how to convert sparse numpy array to Dataframe?

below is the code snippet,

from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder
ct = ColumnTransformer(transformers=[('encoder',OneHotEncoder(),[2,3,4])],remainder='passthrough')
X = np.array(ct.fit_transform(x_data))
X.shape

i get output like below for shape

()

when i try to print X , I get output like below

array(<8820x35 sparse matrix of type '<class 'numpy.float64'>'
    with 41527 stored elements in Compressed Sparse Row format>, dtype=object)

now when i try to convert this array to dataframe

X = pd.DataFrame(X)

i get below error

ValueError: Must pass 2-d input

how do i convert my numpy array to dataframe?

Upvotes: 0

Views: 2169

Answers (2)

hpaulj
hpaulj

Reputation: 231385

Looks like

ct.fit_transform(x_data)

produces a sparse matrix.

np.array(...)

just wraps that in a object dtype array.

array(<8820x35 sparse matrix of type '<class 'numpy.float64'>'
    with 41527 stored elements in Compressed Sparse Row format>, dtype=object)

Use toarray or A to convert it properly to a numpy array:

X = ct.fit_transform(x_data).A

Upvotes: 1

Fading Origami
Fading Origami

Reputation: 203

So first, convert the sparse matrix from csr_matrix to a normal array

 X = X.toarray()
 df  = pd.DataFrame(X)

The above should work

Upvotes: 2

Related Questions