Reputation: 25
I want to delete specific rows based in conditions on rows from a Pandas dataframe.
For example, since I have several currency pairs at the same time, I intend to select only one of the currencies of the same time.
This is the priority: EUR, USD, GBP, CHF.
currency timebuy buyprice
CNHUSD 2021-01-05 08:30:00 0,00005073
CNHGBP 2021-01-05 08:30:00 1,588
ZARGBP 2021-01-07 05:15:00 0,2727
ZARUSD 2021-01-07 05:15:00 300
ZAREUR 2021-01-07 13:00:00 0,1936
ZARCHF 2021-01-07 13:00:00 0,0000052
JPYCHF 2021-01-13 06:00:00 0,0002222
JPYUSD 2021-01-13 06:00:00 8
JPYGBP 2021-01-13 06:00:00 8
to
currency timebuy buyprice
CNHUSD 2021-01-05 08:30:00 0,00005073
ZAREUR 2021-01-07 13:00:00 0,1936
JPYUSD 2021-01-13 06:00:00 8
Upvotes: 2
Views: 84
Reputation: 11105
Using groupby
and reindex
:
# Hard-code your priority for the second currency in each pair
pri = ['EUR', 'USD', 'GBP', 'CHF']
# Create a new column for the second currency of each pair
df['2ndcurr'] = df['currency'].str[-3:]
# Group by time and second currency,
# Sort inner level (1) of resulting MultiIndex to match priority,
# Group by the outer level (0),
# Get the first row of each group,
# Reset timebuy from index into its own column
(df.set_index(['timebuy', '2ndcurr'])
.reindex(pri, level=1)
.groupby(level=0)
.first()
.reset_index())
timebuy currency buyprice
0 2021-01-05 08:30:00 CNHUSD 0,00005073
1 2021-01-07 05:15:00 ZARUSD 300
2 2021-01-07 13:00:00 ZAREUR 0,1936
3 2021-01-13 06:00:00 JPYUSD 8
Upvotes: 1
Reputation:
For a priority list like this, it's easiest to work with numbers. So, you can create a nice numeric mapping from your priority list, and use it to pick rows:
priority = ['EUR', 'USD', 'GBP', 'CHF']
mapping = {p: i for i, p in enumerate(priority)}
indexes = df['currency'].str[-3:].map(mapping).groupby(df['currency'].str[:3]).idxmin().sort_values()
selected = df.loc[indexes]
Output:
>>> selected
currency timebuy buyprice
0 CNHUSD 2021-01-05 08:30:00 0,00005073
4 ZAREUR 2021-01-07 13:00:00 0,1936
7 JPYUSD 2021-01-13 06:00:00 8
One-liner:
priority = ['EUR', 'USD', 'GBP', 'CHF']
filtered = df.loc[df['currency'].str[-3:].map({p: i for i, p in enumerate(priority)}).groupby(df['currency'].str[:3]).idxmin().sort_values()]
If you want to group by each timestamp instead of the first 3 letters of currency
, group by df['timestamp']
instead of df['currency'].str[:3]
, i.e.:
indexes = df['currency'].str[-3:].map(mapping).groupby(df['timestamp']).idxmin().sort_values()
# ^^^^^^^^^^^^^^^
Upvotes: 1