normanius
normanius

Reputation: 9762

How to choose row-index or column-index of pandas table by axis

In pandas tables, row-index and column-index have a very similar interface and some operations allow to operate along either rows and columns simply by a parameter axis. (For example sort_index, and many more.)

But how can I access (read and write) either row-index or column-index by specifying the axis?

# Instead of this
if axis==0:
    table.index = some_function(table.get_index_by_axis(axis))
else:
    table.column = some_function(table.get_index_by_axis(axis))

# I would like to simply write:
newIndex = some_function(table.get_index_by_axis(axis))
table.set_index_by_axis(newIndex, axis=axis)

Does something like get_index_by_axis and set_index_by_axis exist?

Update: Data frames have an attribute axes that permits to choose the axis by index. However, this is read-only. Assigning a new value does not have an effect on the table.

index = table.axes[axis]         # Read an index
newIndex = some_function(index)  
table.axes[axis] = newIndex      # This has no effect on table.

Upvotes: 1

Views: 703

Answers (2)

normanius
normanius

Reputation: 9762

Use pd.DataFrame.set_axis():

import pandas as pd 

def apply_axis(df, axis, func):
    old_index = df.axes[axis]
    new_index = old_index.map(func)
    df = df.set_axis(new_index, axis=axis)
    return df

def some_function(x):
    return x+x

df = pd.DataFrame({'a': [1,2,3],
                   'b': [10,20,30],
                   'c': [100,200,300],
                   'd': [1000,2000,3000]})
#    a   b    c     d
# 0  1  10  100  1000
# 1  2  20  200  2000
# 2  3  30  300  3000

ret = apply_axis(df=df, axis=0, func=some_function)
#    a   b    c     d
# 0  1  10  100  1000
# 2  2  20  200  2000
# 4  3  30  300  3000

ret = apply_axis(df=df, axis=1, func=some_function)
#   aa  bb   cc    dd
# 0  1  10  100  1000
# 1  2  20  200  2000
# 2  3  30  300  3000

Upvotes: 0

Vermillion
Vermillion

Reputation: 1308

I looked into the pandas source code to see how the axis keyword is used. There's a method _get_axis_name that takes the axis as a parameter.

import pandas as pd

df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})

Pass in the axis parameter:

>>> df._get_axis_name(axis=0)
'index'

>>> df._get_axis_name(axis=1)
'columns'

You can use this with getattr or setattr.

>>> getattr(df, df._get_axis_name(axis=0))
RangeIndex(start=0, stop=3, step=1)

>>> getattr(df, df._get_axis_name(axis=1))
Index(['A', 'B'], dtype='object')

Upvotes: 2

Related Questions