Reputation: 63
I have a pandas dataframe with around 1000 rows. here's a made up version which applies to my issue. If i wanted to filter out the Trade IDs when the trade quantity was over 10 this would leave single trade ID's (for instance the second 523 would drop and leave the first one).
I understand I need to lookup through Trade Quantity column and find values which are < 10 then return the trade id from that row, then drop rows where the trade id matches.
I understand this is incorrect, but this is what I had before.
df.drop(df[df['Trade Quantity'] < 10], inplace =True
Could an iloc statement work? and return the index
Ideally the dataframe would be updated so that its left with just the Trade ID 487, my code has no issue for the last two rows as both have a trade quantity under 10 but the issues lies in the case when one quantity is above the threshold and one is below. See below image for desired output.
Upvotes: 2
Views: 3960
Reputation: 8302
try this,
unique_ids = df.loc[df['Trade Quantity'] < 10, 'Trade ID'].unique()
df = df[~df['Trade ID'].isin(unique_ids)]
Upvotes: 1
Reputation: 12493
Here's a fairly standard way to do that using groupby / transform:
df = pd.DataFrame({"trade_id": [523, 523, 487, 487, 367, 367],
"buy_or_sell": ["b", "s", "b", "s", "b", "s"],
"quantity" : [15, 5, 13, 13, 4, 4]})
df["min_quantity"] = df.groupby("trade_id")["quantity"].transform(min)
df[df.min_quantity > 10]
The output is:
trade_id buy_or_sell quantity min_quantity
2 487 b 13 13
3 487 s 13 13
Upvotes: 1
Reputation: 269
It's not pretty, but I think this may do what you're after? Removes any entries where there is only one row with the corresponding Trade ID
df.drop(df[(df["Trade Quantity"] < 10)].index, inplace = True)
for each in set(df["Trade ID"]):
if len(df[(df["Trade ID"]) == each]) < 2:
df.drop(df[(df["Trade ID"] == each)].index, inplace = True)
Upvotes: 2