Reputation: 41

Python pandas multiple conditions

Sorry, I apologise now, just started learning Python and trying to get something working.

Ok dataset is

Buy, typeid, volume, issued, duration, Volume Entered,Minimum Volume, range, price, locationid, locationname

SELL    20  2076541 2015-09-12T06:31:13 90  2076541 1   region  331.21  60008494    Amarr

SELL    20  194642  2015-09-07T19:36:49 90  194642  1   region  300 60008494    Amarr

SELL    20  2320    2015-09-13T07:48:54 3   2320    1   region  211 60008491    Irnin

I would like to filter for a specific location either by name or ID, doesn't bother me, then to pick the minimum price for that location. Preferably to hardcode it in, since I only have a few locations I'm interested. e.g locationid = 60008494.

I see you can do two conditions on one line, but I don't see how to apply it. So I'm trying to nest it. Doesn't have to be pandas, just seems the first thing I found that did one part of what I required.

The code I've gotten so far is, and only does the minimum part of what I'm looking to achieve.

data = pd.read_csv('orders.csv')
length = len(data['typeid'].unique())
res = pd.DataFrame(columns=('Buy', 'typeid', 'volume','duration','volumeE','Minimum','range','price','locationid','locationname'))
for i in range(0,length):
        name_filter = data[data['typeid'] == data['typeid'].unique()[i]]
        price_min_filter = name_filter[name_filter['price'] == name_filter['price'].min() ]  
        res = res.append(price_min_filter, ignore_index=True) 
        i=i+1
        res.to_csv('format.csv') # writes output to csv
print "Complete"

UPDATED. Ok so, the latest part, seems like the following code is the direction I should be going in. If I could have s=typeid, locationid and price, thats perfect. So I've written what I want to do, whats the correct syntax to get that in python? Sorry I'm used to Excel and SQL.

import pandas as pd

df = pd.read_csv('orders.csv')
df[df['locationid'] ==60008494]
s= df.groupby(['typeid'])['price'].min()
s.to_csv('format.csv')

Upvotes: 0

Answers (2)

Anand S Kumar

Reputation: 91009

If what you really want is -

I would like to filter for a specific location either by name or ID, doesn't bother me, then to pick the minimum price for that location. Preferably to hardcode it in, since I only have a few locations I'm interested. e.g locationid = 60008494.

You can simply filter the df on the locationid first and then use ['price'].min(). Example -

In [1]: import pandas as pd

In [2]: s = """Buy,typeid,volume,issued,duration,Volume Entered,Minimum Volume,range,price,locationid,locationname
   ...: SELL,20,2076541,2015-09-12T06:31:13,90,2076541,1,region,331.21,60008494,Amarr
   ...: SELL,20,194642,2015-09-07T19:36:49,90,194642,1,region,300,60008494,Amarr
   ...: SELL,20,2320,2015-09-13T07:48:54,3,2320,1,region,211,60008491,Irnin"""

In [3]: import io

In [4]: df = pd.read_csv(io.StringIO(s))

In [5]: df
Out[5]:
    Buy  typeid   volume               issued  duration  Volume Entered  \
0  SELL      20  2076541  2015-09-12T06:31:13        90         2076541
1  SELL      20   194642  2015-09-07T19:36:49        90          194642
2  SELL      20     2320  2015-09-13T07:48:54         3            2320

   Minimum Volume   range   price  locationid locationname
0               1  region  331.21    60008494        Amarr
1               1  region  300.00    60008494        Amarr
2               1  region  211.00    60008491        Irnin

In [8]: df[df['locationid']==60008494]['price'].min()
Out[8]: 300.0

If you want to do it for all the locationids', then as said in the other answer you can use DataFrame.groupby for that and then take the ['price'] column for the group you want and use .min(). Example -

data = pd.read_csv('orders.csv')
data.groupby(['locationid'])['price'].min()

Demo -

In [9]: df.groupby(['locationid'])['price'].min()
Out[9]:
locationid
60008491    211
60008494    300
Name: price, dtype: float64

For getting the complete row which has minimum values in the corresponding groups, you can use idxmin() to get the index for the minimum value and then pass it to df.loc to get those rows. Example -

In [9]: df.loc[df.groupby(['locationid'])['price'].idxmin()]
Out[9]:
    Buy  typeid  volume               issued  duration  Volume Entered  \
2  SELL      20    2320  2015-09-13T07:48:54         3            2320
1  SELL      20  194642  2015-09-07T19:36:49        90          194642

   Minimum Volume   range  price  locationid locationname
2               1  region    211    60008491        Irnin
1               1  region    300    60008494        Amarr

Upvotes: 1

dagrha

Reputation: 2569

If I understand your question correctly, you really won't need to do much more than a DataFrame.Groupby(). As an example, you can group the dataframe by the locationname, then select the price column from the resulting groupby object, then use the min() method to output the minimum value for each group:

data.groupby('locationname')['price'].min()

which will give you the minimum value of price for each group. So it will look something like:

locationname
Amarr    300
Irnin    211
Name: price, dtype: float64

Upvotes: 0

Python pandas multiple conditions

Answers (2)

Related Questions