Mostafa
Mostafa

Reputation: 1551

how to get a 2d numpy array from a pandas dataframe? - wrong shape

I want to get a 2d-numpy array from a column of a pandas dataframe df having a numpy vector in each row. But if I do

df.values.shape

I get: (3,) instead of getting: (3,5)

(assuming that each numpy vector in the dataframe has 5 dimensions, and that the dataframe has 3 rows)

what is the correct method?

Upvotes: 6

Views: 3970

Answers (2)

unutbu
unutbu

Reputation: 880777

Ideally, avoid getting into this situation by finding a different way to define the DataFrame in the first place. However, if your DataFrame looks like this:

s = pd.Series([np.random.randint(20, size=(5,)) for i in range(3)])
df = pd.DataFrame(s, columns=['foo'])
#                   foo
# 0   [4, 14, 9, 16, 5]
# 1  [16, 16, 5, 4, 19]
# 2  [7, 10, 15, 13, 2]

then you could convert it to a DataFrame of shape (3,5) by calling pd.DataFrame on a list of arrays:

pd.DataFrame(df['foo'].tolist())
#     0   1   2   3   4
# 0   4  14   9  16   5
# 1  16  16   5   4  19
# 2   7  10  15  13   2

pd.DataFrame(df['foo'].tolist()).values.shape
# (3, 5)

Upvotes: 8

Ujjwal
Ujjwal

Reputation: 3168

I am not sure what you want. But df.values.shape seems to be giving the correct result.

import pandas as pd
import numpy as np
from pandas import DataFrame

df3 = DataFrame(np.random.randn(3, 5), columns=['a', 'b', 'c', 'd', 'e'])

print df3
#          a         b         c         d         e
#0 -0.221059  1.206064 -1.359214  0.674061  0.547711
#1  0.246188  0.628944  0.528552  0.179939 -0.019213
#2  0.080049  0.579549  1.790376 -1.301700  1.372702

df3.values.shape
#(3L, 5L)

df3["a"]
#0   -0.221059
#1    0.246188
#2    0.080049

df3[:1]
#     a         b           c           d           e
#0  -0.221059   1.206064    -1.359214   0.674061    0.547711

Upvotes: 1

Related Questions