Reputation: 267
I have a pandas dataframe like the following:
index Validation_Set Topics Alpha Beta Coherence
0 75% Corpus 14 0.5 0.5 0.501483
1 75% Corpus 14 0.5 symmetric 0.481676
2 100% Corpus 14 asymmetric 0.5 0.500620
3 100% Corpus 14 0.5 symmetric 0.492288
4 75% Corpus 12 0.5 0.5 0.511823
5 75% Corpus 12 0.5 symmetric 0.477614
6 100% Corpus 12 asymmetric 0.5 0.489424
7 100% Corpus 12 0.5 symmetric 0.541270
8 75% Corpus 4 0.5 0.5 0.515683
9 75% Corpus 4 0.5 symmetric 0.430614
10 100% Corpus 4 asymmetric 0.5 0.489324
11 100% Corpus 4 0.5 symmetric 0.473570
And so on... these are results from several tests for parameter tuning.
Now I want to extract all the information (all tests on parameters) only about the best model, which is the one(or maybe more than one) that has achieved the highest value of 'Coherence' on the full validation set (100% Corpus).
In this example I would get [ERROR, SEE EDIT]:
index Validation_Set Topics Alpha Beta Coherence
7 100% Corpus 12 0.5 symmetric 0.541270
I managed to retrieve the row with the highest value for 'Coherence' in this way (df is the full dataframe):
corpus_100 = df[df['Validation_Set']=='100% Corpus']
topics_num = df.iloc[[corpus_100['Coherence'].idxmax()]]['Topics'].values[0]
opt_model = corpus_100[corpus_100['Topics']==topics_num]
And is working, but it's really a mess, then I'm looking for a more clear way to implement this.
Thank you!
EDIT: I'm really sorry, but there was a typo in the desired output that actually is:
4 75% Corpus 12 0.5 0.5 0.511823
5 75% Corpus 12 0.5 symmetric 0.477614
6 100% Corpus 12 asymmetric 0.5 0.489424
7 100% Corpus 12 0.5 symmetric 0.541270
Upvotes: 0
Views: 73
Reputation: 471
Try this,
df[df['Coherence']==df['Coherence'].max()]
df[df['column']==value]
filters the dataframe for whatever you are looking for.
df['column']max()
returns the maximum value in 'column'.
Putting them together will return the row of the dataframe with the maximum value in Coherence
Upvotes: 1
Reputation: 5955
Looks like nlargest() is exactly what you need
df[df['Validation_Set']=='100% Corpus'].nlargest(1,'Coherence')
index Validation_Set Topics Alpha Beta Coherence
7 100%Corpus 12 0.5 symmetric 0.54127
Upvotes: 0