Reputation: 23
I am looking to find the 3 largest values for each row in my DataFrame, but only from certain columns (i.e. there are a total of 10 columns in my DataFrame but I only want it to consider 6 columns when calculating the largest values). When finding the 3 largest,
I want to create three new columns in my existing DataFrame called 'Top 1' 'Top 2' and 'Top 3'. I am using Pandas in Python.
This is my code:
df_2 = pd.DataFrame(
df_1.apply(
lambda x: list(
df_1.columns[np.array(x).argsort()[::-1][:3]]
), axis=1
).to_list(), columns=['Top1', 'Top2', 'Top3']
)
I am getting an error message because this code considers my whole dataset when I only want to look at the columns ['t1', 't2', 't3', 't4', 't5', 't6']
. Where would I enter this specification in my code?
Upvotes: 0
Views: 51
Reputation: 24322
Try Via list comprehension, Dataframe()
method and nlargest()
method:
col= ['t1', 't2', 't3', 't4', 't5', 't6']
out=pd.DataFrame([df[x].nlargest(3) for x in col])
OR
Modify your method a little:
df_2 = pd.DataFrame(
df_1[col].apply(
lambda x: list(
df_1.columns[np.array(x).argsort()[::-1][:3]]
), axis=1
).to_list(), columns=['Top1', 'Top2', 'Top3']
)
Upvotes: 0
Reputation: 863801
For improve performance dont use apply
, because loops under the hood:
cols = ['t1', 't2', 't3', 't4', 't5', 't6']
df = pd.DataFrame(np.array(cols)[np.argsort(-df[cols].to_numpy(), axis=1)[:, :3]],
columns=['Top1', 'Top2', 'Top3'])
print (df)
Upvotes: 1