Web Development Labs
Web Development Labs

Reputation: 431

What does (x,) denotes in NumPy shapes?

I have been struggling to find what exactly does (x,) denote in NumPy shapes? From its appearance, I know it is telling that there are "x" number of columns/elements in the array, which is basically a 1D array.

But my question is what does the COMMA denote after x here (x,)? I am asking this question because, I am trying to create a DataFrame and it is giving me an error:

ValueError: Shape of passed values is (3, 1), indices imply (1, 3)

My code:

price = np.array([10, 8, 12])

df_price = pd.DataFrame(price, 
                        index=(["Price"]),
                        columns=(["Almond Butter","Peanut Butter", "Cashew Butter"]))

Could anyone tell me how come this "price" array's shape is (3,1) here? It is not. It is (3,) -- that is it.

Upvotes: 4

Views: 771

Answers (3)

Code-Apprentice
Code-Apprentice

Reputation: 83527

my question is what does the COMMA denote after x here (x,)?

This syntax is generic Python and not specific to Numpy. We add a comma in this situation when we want to create a tuple. You should be familiar with tuples like (3, 4). What if we want to create a tuple with one element, though. You can try (3), but now Python interprets the parentheses as a grouping operator in a mathematical expresion in the same way when we use them like (3 + 4) * 5. This means that (3) is just the integer value 3, not a tuple. So we add a comma (3,) to create a tuple with a single element.

Upvotes: 1

hpaulj
hpaulj

Reputation: 231375

The full traceback of your error indicates that DataFrame has done quite a bit of processing of your input.

In [336]: pd.DataFrame(np.arange(1,4),  
     ...:                         index=(["Price"]), 
     ...:                         columns=(["Almond Butter","Peanut Butter", "Cashew Butter"]))      
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
/usr/local/lib/python3.6/dist-packages/pandas/core/internals/managers.py in create_block_manager_from_blocks(blocks, axes)
   1653                 blocks = [
-> 1654                     make_block(values=blocks[0], placement=slice(0, len(axes[0])))
   1655                 ]

/usr/local/lib/python3.6/dist-packages/pandas/core/internals/blocks.py in make_block(values, placement, klass, ndim, dtype)
   3052 
-> 3053     return klass(values, ndim=ndim, placement=placement)
   3054 

/usr/local/lib/python3.6/dist-packages/pandas/core/internals/blocks.py in __init__(self, values, placement, ndim)
    124             raise ValueError(
--> 125                 f"Wrong number of items passed {len(self.values)}, "
    126                 f"placement implies {len(self.mgr_locs)}"

ValueError: Wrong number of items passed 1, placement implies 3

During handling of the above exception, another exception occurred:

ValueError                                Traceback (most recent call last)
<ipython-input-336-43d59803fb0f> in <module>
      1 pd.DataFrame(np.arange(1,4), 
      2                         index=(["Price"]),
----> 3                         columns=(["Almond Butter","Peanut Butter", "Cashew Butter"]))

/usr/local/lib/python3.6/dist-packages/pandas/core/frame.py in __init__(self, data, index, columns, dtype, copy)
    462                 mgr = init_dict({data.name: data}, index, columns, dtype=dtype)
    463             else:
--> 464                 mgr = init_ndarray(data, index, columns, dtype=dtype, copy=copy)
    465 
    466         # For data is list-like, or Iterable (will consume into list)

/usr/local/lib/python3.6/dist-packages/pandas/core/internals/construction.py in init_ndarray(values, index, columns, dtype, copy)
    208         block_values = [values]
    209 
--> 210     return create_block_manager_from_blocks(block_values, [columns, index])
    211 
    212 

/usr/local/lib/python3.6/dist-packages/pandas/core/internals/managers.py in create_block_manager_from_blocks(blocks, axes)
   1662         blocks = [getattr(b, "values", b) for b in blocks]
   1663         tot_items = sum(b.shape[0] for b in blocks)
-> 1664         construction_error(tot_items, blocks[0].shape[1:], axes, e)
   1665 
   1666 

/usr/local/lib/python3.6/dist-packages/pandas/core/internals/managers.py in construction_error(tot_items, block_shape, axes, e)
   1692     if block_shape[0] == 0:
   1693         raise ValueError("Empty data passed with indices specified.")
-> 1694     raise ValueError(f"Shape of passed values is {passed}, indices imply {implied}")
   1695 
   1696 

ValueError: Shape of passed values is (3, 1), indices imply (1, 3)

If we don't specify indices, it produces a 1d column frame:

In [337]: pd.DataFrame(np.arange(1,4))      # (3,) input                                                         
Out[337]: 
   0
0  1
1  2
2  3

same as (3,1) input:

In [339]: pd.DataFrame(np.arange(1,4)[:,None])   # (3,1) input                                                    
Out[339]: 
   0
0  1
1  2
2  3

but you wanted a (1,3):

In [340]: pd.DataFrame(np.arange(1,4)[None,:])  # (1,3) input                                                     
Out[340]: 
   0  1  2
0  1  2  3

numpy broadcasting can expand a (3,) array to (1,3), but that's not what DataFrame is doing.

Depending on how you look at it, a pandas dataframe can appear to be a transpose of a 2d numpy array. A Series is 1d, but displays vertically. And dataframe indexing gives priority to columns. I've also seen transposes when exploring the connection between the underlying data storage and output of values/to_numpy(). The details are complicated. Notice that the traceback talks about a 'block_manager' etc.

In [342]: pd.Series(np.arange(1,4))                                                                  
Out[342]: 
0    1
1    2
2    3
dtype: int64

Upvotes: 1

jkr
jkr

Reputation: 19250

When trying to create a Pandas DataFrame from a flat array, the array must transformed to some 2D form, because Pandas DataFrames are almost always 2D.

The issue arises because you have one row and three columns, so the data array is expected to have the shape (1, 3). The pd.DataFrame constructor must be adding a dimension to the end of the array and assumes that each item in the first dimension is a row in the DataFrame.

A simple fix for this is to reshape your data array to the number of rows by the number of columns.

price = np.array([10, 8, 12]).reshape(1, -1)

The -1 in the .reshape call above tells the function to infer the length of that axis.

Upvotes: 4

Related Questions