Reputation: 608

Pandas: Get top n columns based on a row values

Having a dataframe with a single row, I need to filter it into a smaller one with filtered columns based on a value in a row.
What's the most effective way?

df = pd.DataFrame({'a':[1], 'b':[10], 'c':[3], 'd':[5]})

a	b	c	d
1	10	3	5

For example top-3 features:

b	c	d
10	3	5

Upvotes: 3

Answers (3)

hsaltan

Reputation: 531

You can use np.argsort to get the solution. This Numpy method, in the below code, gives the indices of the column values in descending order. Then slicing selects the largest n values' indices.

import pandas as pd
import numpy as np

# Your dataframe
df = pd.DataFrame({'a':[1], 'b':[10], 'c':[3], 'd':[5]})

# Pick the number n to find n largest values
nlargest = 3

# Get the order of the largest value columns by their indices
order = np.argsort(-df.values, axis=1)[:, :nlargest]

# Find the columns with the largest values
top_features = df.columns[order].tolist()[0]

# Filter the dateframe by the columns
top_features_df = df[top_features]

top_features_df

output:

    b   d   c
0   10  5   3

Upvotes: 1

jezrael

Reputation: 863801

Use sorting per row and select first 3 values:

df1 = df.sort_values(0, axis=1, ascending=False).iloc[:, :3]
print (df1)
    b  d  c
0  10  5  3

Solution with Series.nlargest:

df1 = df.iloc[0].nlargest(3).to_frame().T
print (df1)
    b  d  c
0  10  5  3

Upvotes: 3

sophocles

Reputation: 13841

You can transpose T, and use nlargest():

new = df.T.nlargest(columns = 0, n = 3).T

print(new)

   b  d  c
0  10  5  3

Upvotes: 2

Pandas: Get top n columns based on a row values

Answers (3)

Related Questions