Reputation: 2198
How do I get the row of the below csv's highest price from the result retrieved in the for loop any references or solutions ?
import pandas as pd
import numpy as np
df = pd.read_csv('/Users/caleb/ic/US_FINAL.csv',names=['ProductName', 'Year', 'Production', 'Price'])
df.set_index("ProductName",inplace=True)
products=['FortuneCookie']
for product in products:
print(df.loc[product])
Upvotes: 0
Views: 155
Reputation: 59549
You were close. But df.loc[product]
was not doing what you wanted it to. You want to find where df['ProductName'] == product
, so the subsets are given by df.loc[df['ProductName']==product]
. The rest just determines the row where the price is the max.
for product in products:
print(df.loc[df.Price == df[df.ProductName == product]['Price'].max()])
Here's a full example if you want to play around with it. If you want to generate this for every Product in your dataframe, you may consider using df.ProductName.unique()
instead of explicitly defining the list, since you won't have to specify every product individually.
import pandas as pd
df = pd.DataFrame({'ProductName': ['Sandwich', 'FortuneCookie', 'FortuneCookie', 'Sandwich'],
'Year': [2010, 2010, 2011, 2009],
'Production': [17, 18, 101, 17],
'Price': [21.11, 17.82, 19.99, 20.13]})
products = df.ProductName.unique()
for product in products:
print(df.loc[df.Price == df[df.ProductName == product]['Price'].max()])
Price ProductName Production Year
0 21.11 Sandwich 17 2010
Price ProductName Production Year
2 19.99 FortuneCookie 101 2011
Upvotes: 2
Reputation: 21719
I think a simple groupby
function can get you:
# using data from @ALLoLz answer
df = pd.DataFrame({'ProductName': ['Sandwich', 'FortuneCookie', 'FortuneCookie', 'Sandwich'],
'Year': [2010, 2010, 2011, 2009],
'Production': [17, 18, 101, 17],
'Price': [21.11, 17.82, 19.99, 20.13]})
df.groupby('ProductName')['Price'].max()
# alternative
# df.groupby('ProductName').agg({'Price':'max'})
Upvotes: 1
Reputation: 324
I think the df.idxmax() function would your best bet
So for you it would be
for product in products:
temp = df.loc[product]
print(temp.loc[temp['Price'].idxmax()])
Upvotes: 1