marz
marz

Reputation: 973

Numpy vs Pandas axis

Why axis differs in Numpy vs Pandas?

Example:

If I want to get rid of column in Pandas I could do this:

df.drop("column", axis = 1, inplace = True)

Here, we are using axis = 1 to drop a column (vertically in a DF).

In Numpy, if I want to sum a matrix A vertically I would use:

A.sum(axis = 0)

Here I use axis = 0.

Upvotes: 0

Views: 638

Answers (2)

hpaulj
hpaulj

Reputation: 231615

axis isn't used that often in pandas. A dataframe has 2 dimensions, which are often treated quite differently. In drop the axis definition is well documented, and actually corresponds to the numpy usage.

Make a simple array and data frame:

In [180]: x = np.arange(9).reshape(3,3)                                         
In [181]: df = pd.DataFrame(x)                                                  
In [182]: df                                                                    
Out[182]: 
   0  1  2
0  0  1  2
1  3  4  5
2  6  7  8

Delete a row from the array, or a column:

In [183]: np.delete(x, 1, 0)                                                    
Out[183]: 
array([[0, 1, 2],
       [6, 7, 8]])
In [184]: np.delete(x, 1, 1)                                                    
Out[184]: 
array([[0, 2],
       [3, 5],
       [6, 8]])

Drop does the same thing for the same axis:

In [185]: df.drop(1, axis=0)                                                    
Out[185]: 
   0  1  2
0  0  1  2
2  6  7  8
In [186]: df.drop(1, axis=1)                                                    
Out[186]: 
   0  2
0  0  2
1  3  5
2  6  8

In sum, the definitions are the same as well:

In [188]: x.sum(axis=0)                                                         
Out[188]: array([ 9, 12, 15])
In [189]: df.sum(axis=0)                                                        
Out[189]: 
0     9
1    12
2    15
dtype: int64
In [190]: x.sum(axis=1)                                                         
Out[190]: array([ 3, 12, 21])
In [191]: df.sum(axis=1)                                                        
Out[191]: 
0     3
1    12
2    21
dtype: int64

The pandas sums are Series, which are the pandas equivalent of a 1d array.

Visualizing what axis does with reduction operations like sum is a bit tricky - especially with 2d arrays. Is the axis kept or removed? It can help to think about axis for 1d arrays (the only axis is removed), or 3d arrays, where one axis is removed leaving two.

Upvotes: 1

Hugolmn
Hugolmn

Reputation: 1560

When you get rid of a column, the name is picked from the axis 1, which is the horizontal axis. When you sum along the axis 0, you sum vertically.

Upvotes: 0

Related Questions