Reputation: 97
I am using Pandas in Python and having a spot of trouble. I have a dataframe with an index and 2 columns: "VIFFactor" and "features".
I am trying to return the "feature" record from my dataframe that has the maximum value in "VIFFactor" PROVIDED the number is over 5.
Here's my code:
I have tried to replace idsmax with max and got "cannot compare a dtyped [bool] array with a scalar of type [bool]".
vif3 = vif_test.loc[(vif_test['VIFFactor'] >= 5) & (vif_test['VIFFactor'].idxmax()), 'features']
So for example, from the below, I would like to return HadCampaign because it is the highest record and over 5 but I am currently getting nothing:
VIFFactor features
2 12.028754355028974 HadCampaign
22 11.98926492333954 DiscountedPrice
29 5.460195615389739 RatingsReceivedRank
30 4.59851607313422 SortOrder
19 3.0681452496804833 PreferredPartnerBadge
9 3.0554578279939815 PerkCustomerDropService
28 2.735597253984768 RatingsReceived
26 2.263922204962396 PriceRank
Upvotes: 0
Views: 4924
Reputation: 323226
This will not work you should look at max
, your first condition pass a bool and the length is len(df)
, but idxmax
will only return the index
meet the max
value which is one value , so if you need slice the dataframe , you need pass two condition same length at least
vif3 = vif_test.loc[(vif_test['VIFFactor'] >= 5) & (vif_test['VIFFactor'].max() == vif_test['VIFFactor']), 'features']
Upvotes: 1
Reputation: 346
Splitting over two lines might be clearer:
vif_test = vif_test[vif_test.VIFFactor > 5].set_index('VIFFactor')
vif3 = vif_test.loc[max(vif_test.index), 'feature']
Note: max() may often be faster that index.max(). Testing on a small dataframe:
%timeit d.index.max()
34.3 µs ± 447 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
%timeit max(d.index)
9.43 µs ± 143 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
Upvotes: 0