Reputation: 5324
My understanding is pd.DataFrame().shape
returns (n_rows, n_columns).
However when constructing a dataframe and the indices do not match with the data shape, pandas raises a ValueError
with the shape as (n_columns, n_rows).
Example:
df_2 = pd.DataFrame(np.random.randn(10,2), index = range(9))
ValueError: Shape of passed values is (2, 10), indices imply (2, 9)
Why does ValueError not print:
Shape of passed values is (10, 2), indices imply (9, 2)
Pandas Version: '0.17.1'
Upvotes: 4
Views: 3646
Reputation: 167
print range(9)
returns: [0, 1, 2, 3, 4, 5, 6, 7, 8]
, so giving a (10,2) ["10 by 2" array (20 values in two columns of 10)] an index that is a one-dimensional array of 9 values starting at zero won't 'fit' the dimensions of the Numpy array you're converting to a Pandas DataFrame.
Upvotes: 0
Reputation: 375915
When pandas says "indices" here it means the index and the columns (they are both of type Index).
In [11]: df = pd.DataFrame(np.random.randn(3,2))
In [12]: df.index
Out[12]: Int64Index([0, 1, 2], dtype='int64')
In [13]: df.columns
Out[13]: Int64Index([0, 1], dtype='int64')
You are passing something with a length 9 .index
Index, and a length 2 .columns
Index, hence the error message...
Which is to say: your code is equivalent to:
In [21]: df = pd.DataFrame(np.random.randn(10,2), index=np.arange(9), columns=np.arange(2))
ValueError: Shape of passed values is (2, 10), indices imply (2, 9)
When you want is:
df = pd.DataFrame(np.random.randn(10,2), index=np.arange(10), colummns=np.arange(2))
# equivalently
df = pd.DataFrame(np.random.randn(10,2), index=np.arange(10))
df = pd.DataFrame(np.random.randn(10,2))
Upvotes: 1