Reputation: 608
Having a dataframe with a single row, I need to filter it into a smaller one with filtered columns based on a value in a row.
What's the most effective way?
df = pd.DataFrame({'a':[1], 'b':[10], 'c':[3], 'd':[5]})
a | b | c | d |
---|---|---|---|
1 | 10 | 3 | 5 |
For example top-3 features:
b | c | d |
---|---|---|
10 | 3 | 5 |
Upvotes: 3
Views: 3272
Reputation: 531
You can use np.argsort
to get the solution. This Numpy method, in the below code, gives the indices of the column values in descending order. Then slicing selects the largest n values' indices.
import pandas as pd
import numpy as np
# Your dataframe
df = pd.DataFrame({'a':[1], 'b':[10], 'c':[3], 'd':[5]})
# Pick the number n to find n largest values
nlargest = 3
# Get the order of the largest value columns by their indices
order = np.argsort(-df.values, axis=1)[:, :nlargest]
# Find the columns with the largest values
top_features = df.columns[order].tolist()[0]
# Filter the dateframe by the columns
top_features_df = df[top_features]
top_features_df
output:
b d c
0 10 5 3
Upvotes: 1
Reputation: 863801
Use sorting per row and select first 3 values:
df1 = df.sort_values(0, axis=1, ascending=False).iloc[:, :3]
print (df1)
b d c
0 10 5 3
Solution with Series.nlargest
:
df1 = df.iloc[0].nlargest(3).to_frame().T
print (df1)
b d c
0 10 5 3
Upvotes: 3
Reputation: 13841
You can transpose T
, and use nlargest()
:
new = df.T.nlargest(columns = 0, n = 3).T
print(new)
b d c
0 10 5 3
Upvotes: 2