Delete specific rows based in conditions on rows from a dataframe pandas

I want to delete specific rows based in conditions on rows from a Pandas dataframe.

For example, since I have several currency pairs at the same time, I intend to select only one of the currencies of the same time.

This is the priority: EUR, USD, GBP, CHF.

currency    timebuy buyprice
CNHUSD  2021-01-05 08:30:00 0,00005073
CNHGBP  2021-01-05 08:30:00 1,588
ZARGBP  2021-01-07 05:15:00 0,2727
ZARUSD  2021-01-07 05:15:00 300
ZAREUR  2021-01-07 13:00:00 0,1936
ZARCHF  2021-01-07 13:00:00 0,0000052
JPYCHF  2021-01-13 06:00:00 0,0002222
JPYUSD  2021-01-13 06:00:00 8
JPYGBP  2021-01-13 06:00:00 8

enter image description here

to

currency    timebuy buyprice
CNHUSD  2021-01-05 08:30:00 0,00005073
ZAREUR  2021-01-07 13:00:00 0,1936
JPYUSD  2021-01-13 06:00:00 8

enter image description here

Upvotes: 2

Views: 84

Answers (2)

Peter Leimbigler
Peter Leimbigler

Reputation: 11105

Using groupby and reindex:

# Hard-code your priority for the second currency in each pair
pri = ['EUR', 'USD', 'GBP', 'CHF']

# Create a new column for the second currency of each pair
df['2ndcurr'] = df['currency'].str[-3:]


# Group by time and second currency,
# Sort inner level (1) of resulting MultiIndex to match priority,
# Group by the outer level (0),
# Get the first row of each group,
# Reset timebuy from index into its own column

(df.set_index(['timebuy', '2ndcurr'])
   .reindex(pri, level=1)
   .groupby(level=0)
   .first()
   .reset_index())

               timebuy currency    buyprice
0  2021-01-05 08:30:00   CNHUSD  0,00005073
1  2021-01-07 05:15:00   ZARUSD         300
2  2021-01-07 13:00:00   ZAREUR      0,1936
3  2021-01-13 06:00:00   JPYUSD           8

Upvotes: 1

user17242583
user17242583

Reputation:

For a priority list like this, it's easiest to work with numbers. So, you can create a nice numeric mapping from your priority list, and use it to pick rows:

priority = ['EUR', 'USD', 'GBP', 'CHF']
mapping = {p: i for i, p in enumerate(priority)}
indexes = df['currency'].str[-3:].map(mapping).groupby(df['currency'].str[:3]).idxmin().sort_values()
selected = df.loc[indexes]

Output:

>>> selected
  currency             timebuy    buyprice
0   CNHUSD 2021-01-05 08:30:00  0,00005073
4   ZAREUR 2021-01-07 13:00:00      0,1936
7   JPYUSD 2021-01-13 06:00:00           8

One-liner:

priority = ['EUR', 'USD', 'GBP', 'CHF']
filtered = df.loc[df['currency'].str[-3:].map({p: i for i, p in enumerate(priority)}).groupby(df['currency'].str[:3]).idxmin().sort_values()]

If you want to group by each timestamp instead of the first 3 letters of currency, group by df['timestamp'] instead of df['currency'].str[:3], i.e.:

indexes = df['currency'].str[-3:].map(mapping).groupby(df['timestamp']).idxmin().sort_values()
                                                     # ^^^^^^^^^^^^^^^   

Upvotes: 1

Related Questions