ilyas patanam
ilyas patanam

Reputation: 5324

Shape returned by Pandas ValueError does not match the dataframe shape?

My understanding is pd.DataFrame().shapereturns (n_rows, n_columns). However when constructing a dataframe and the indices do not match with the data shape, pandas raises a ValueError with the shape as (n_columns, n_rows).

Example:

df_2 = pd.DataFrame(np.random.randn(10,2), index = range(9))

ValueError: Shape of passed values is (2, 10), indices imply (2, 9)

Why does ValueError not print:

Shape of passed values is (10, 2), indices imply (9, 2)

Pandas Version: '0.17.1'

Upvotes: 4

Views: 3646

Answers (2)

Andrew Pederson
Andrew Pederson

Reputation: 167

print range(9) returns: [0, 1, 2, 3, 4, 5, 6, 7, 8], so giving a (10,2) ["10 by 2" array (20 values in two columns of 10)] an index that is a one-dimensional array of 9 values starting at zero won't 'fit' the dimensions of the Numpy array you're converting to a Pandas DataFrame.

Upvotes: 0

Andy Hayden
Andy Hayden

Reputation: 375915

When pandas says "indices" here it means the index and the columns (they are both of type Index).

In [11]: df = pd.DataFrame(np.random.randn(3,2))

In [12]: df.index
Out[12]: Int64Index([0, 1, 2], dtype='int64')

In [13]: df.columns
Out[13]: Int64Index([0, 1], dtype='int64')

You are passing something with a length 9 .index Index, and a length 2 .columns Index, hence the error message...
Which is to say: your code is equivalent to:

In [21]: df = pd.DataFrame(np.random.randn(10,2), index=np.arange(9), columns=np.arange(2))
ValueError: Shape of passed values is (2, 10), indices imply (2, 9)

When you want is:

df = pd.DataFrame(np.random.randn(10,2), index=np.arange(10), colummns=np.arange(2))
# equivalently
df = pd.DataFrame(np.random.randn(10,2), index=np.arange(10))
df = pd.DataFrame(np.random.randn(10,2))

Upvotes: 1

Related Questions