Reputation: 431
I have been struggling to find what exactly does (x,) denote in NumPy shapes? From its appearance, I know it is telling that there are "x" number of columns/elements in the array, which is basically a 1D array.
But my question is what does the COMMA denote after x here (x,)? I am asking this question because, I am trying to create a DataFrame and it is giving me an error:
ValueError: Shape of passed values is (3, 1), indices imply (1, 3)
My code:
price = np.array([10, 8, 12])
df_price = pd.DataFrame(price,
index=(["Price"]),
columns=(["Almond Butter","Peanut Butter", "Cashew Butter"]))
Could anyone tell me how come this "price" array's shape is (3,1) here? It is not. It is (3,) -- that is it.
Upvotes: 4
Views: 771
Reputation: 83527
my question is what does the COMMA denote after x here (x,)?
This syntax is generic Python and not specific to Numpy. We add a comma in this situation when we want to create a tuple. You should be familiar with tuples like (3, 4)
. What if we want to create a tuple with one element, though. You can try (3)
, but now Python interprets the parentheses as a grouping operator in a mathematical expresion in the same way when we use them like (3 + 4) * 5
. This means that (3)
is just the integer value 3
, not a tuple. So we add a comma (3,)
to create a tuple with a single element.
Upvotes: 1
Reputation: 231375
The full traceback of your error indicates that DataFrame
has done quite a bit of processing of your input.
In [336]: pd.DataFrame(np.arange(1,4),
...: index=(["Price"]),
...: columns=(["Almond Butter","Peanut Butter", "Cashew Butter"]))
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
/usr/local/lib/python3.6/dist-packages/pandas/core/internals/managers.py in create_block_manager_from_blocks(blocks, axes)
1653 blocks = [
-> 1654 make_block(values=blocks[0], placement=slice(0, len(axes[0])))
1655 ]
/usr/local/lib/python3.6/dist-packages/pandas/core/internals/blocks.py in make_block(values, placement, klass, ndim, dtype)
3052
-> 3053 return klass(values, ndim=ndim, placement=placement)
3054
/usr/local/lib/python3.6/dist-packages/pandas/core/internals/blocks.py in __init__(self, values, placement, ndim)
124 raise ValueError(
--> 125 f"Wrong number of items passed {len(self.values)}, "
126 f"placement implies {len(self.mgr_locs)}"
ValueError: Wrong number of items passed 1, placement implies 3
During handling of the above exception, another exception occurred:
ValueError Traceback (most recent call last)
<ipython-input-336-43d59803fb0f> in <module>
1 pd.DataFrame(np.arange(1,4),
2 index=(["Price"]),
----> 3 columns=(["Almond Butter","Peanut Butter", "Cashew Butter"]))
/usr/local/lib/python3.6/dist-packages/pandas/core/frame.py in __init__(self, data, index, columns, dtype, copy)
462 mgr = init_dict({data.name: data}, index, columns, dtype=dtype)
463 else:
--> 464 mgr = init_ndarray(data, index, columns, dtype=dtype, copy=copy)
465
466 # For data is list-like, or Iterable (will consume into list)
/usr/local/lib/python3.6/dist-packages/pandas/core/internals/construction.py in init_ndarray(values, index, columns, dtype, copy)
208 block_values = [values]
209
--> 210 return create_block_manager_from_blocks(block_values, [columns, index])
211
212
/usr/local/lib/python3.6/dist-packages/pandas/core/internals/managers.py in create_block_manager_from_blocks(blocks, axes)
1662 blocks = [getattr(b, "values", b) for b in blocks]
1663 tot_items = sum(b.shape[0] for b in blocks)
-> 1664 construction_error(tot_items, blocks[0].shape[1:], axes, e)
1665
1666
/usr/local/lib/python3.6/dist-packages/pandas/core/internals/managers.py in construction_error(tot_items, block_shape, axes, e)
1692 if block_shape[0] == 0:
1693 raise ValueError("Empty data passed with indices specified.")
-> 1694 raise ValueError(f"Shape of passed values is {passed}, indices imply {implied}")
1695
1696
ValueError: Shape of passed values is (3, 1), indices imply (1, 3)
If we don't specify indices, it produces a 1d column frame:
In [337]: pd.DataFrame(np.arange(1,4)) # (3,) input
Out[337]:
0
0 1
1 2
2 3
same as (3,1) input:
In [339]: pd.DataFrame(np.arange(1,4)[:,None]) # (3,1) input
Out[339]:
0
0 1
1 2
2 3
but you wanted a (1,3):
In [340]: pd.DataFrame(np.arange(1,4)[None,:]) # (1,3) input
Out[340]:
0 1 2
0 1 2 3
numpy
broadcasting can expand a (3,) array to (1,3), but that's not what DataFrame
is doing.
Depending on how you look at it, a pandas dataframe can appear to be a transpose of a 2d numpy array. A Series is 1d, but displays vertically. And dataframe indexing gives priority to columns. I've also seen transposes when exploring the connection between the underlying data storage and output of values/to_numpy()
. The details are complicated. Notice that the traceback talks about a 'block_manager' etc.
In [342]: pd.Series(np.arange(1,4))
Out[342]:
0 1
1 2
2 3
dtype: int64
Upvotes: 1
Reputation: 19250
When trying to create a Pandas DataFrame from a flat array, the array must transformed to some 2D form, because Pandas DataFrames are almost always 2D.
The issue arises because you have one row and three columns, so the data array is expected to have the shape (1, 3)
. The pd.DataFrame
constructor must be adding a dimension to the end of the array and assumes that each item in the first dimension is a row in the DataFrame.
A simple fix for this is to reshape your data array to the number of rows by the number of columns.
price = np.array([10, 8, 12]).reshape(1, -1)
The -1
in the .reshape
call above tells the function to infer the length of that axis.
Upvotes: 4