KBriggs
KBriggs

Reputation: 1412

SQL query to delete elements that are not sorted in pandas DataFrame

I am writing a little code to optimize an NFL fantasy draft. Because the database is fairly large, my first task is to cut it down to size by removing entries which I know for a fast will not form part of the optimal draft.

Each player has a salary which I have to pay, and a projected fantasy point total. For example, here is part of the list of possible defenders:

   Id           Position  FPPG    Salary 

0  12542        D         3.8     4000  
1  12534        D         7.5     4100  
2  12548        D         4.6     4200  
3  12532        D         8.2     4300  
4  12547        D         8.4     4400  
5  12545        D         9.2     4500  
6  12525        D         9.0     4600  
7  12553        D         12.3    4700  
8  12544        D         10.8    4800  
9  12531        D         17.0    5000  

I have already written SQL queries that will eliminate players with injuries, and will eliminate all but the highest projected scorer at each salary price point:

d_db = sqldf('SELECT Id, Position, FPPG, Salary from d_db\
            WHERE FPPG = (SELECT max(FPPG) from d_db as db where db.Salary = d_db.Salary) ORDER By Salary',locals())

My task now is to eliminate all the players for which there exists a cheaper player who is projected to get more points. For example, in the list above, row #2 is useless and can be eliminated since row #1 is both cheaper and has a higher FPPG.

Put another way, I want to delete all rows for which the FPPG column is not sorted in ascending order when the db is sorted by Salary. The final result would ideally be:

   Id           Position  FPPG    Salary 

0  12542        D         3.8     4000  
1  12534        D         7.5     4100  
2  12532        D         8.2     4300  
3  12547        D         8.4     4400  
4  12545        D         9.2     4500  
5  12553        D         12.3    4700  
6  12531        D         17.0    5000  

Any suggestions? I can't seem to figure out a WHERE condition for a DELETE statement that covers this nicely, so maybe this is more complicated than a single query?

Upvotes: 1

Views: 256

Answers (1)

Juan Carlos Oropeza
Juan Carlos Oropeza

Reputation: 48197

SQL Fiddle Demo

Using LEFT JOIN you try to match each player with someone better and cheaper.

This for debug

SELECT *
FROM Table1 t1
LEFT JOIN Table1 t2
       ON t1."FPPG" < t2."FPPG"
      AND t1."Salary" > t2."Salary"

OUTPUT

| row |    Id | Position | FPPG | Salary |    row |     Id | Position |   FPPG | Salary |
|-----|-------|----------|------|--------|--------|--------|----------|--------|--------|
|   0 | 12542 |        D |    4 |   4000 | (null) | (null) |   (null) | (null) | (null) |
|   1 | 12534 |        D |    8 |   4100 | (null) | (null) |   (null) | (null) | (null) |
|   2 | 12548 |        D |    5 |   4200 |      1 |  12534 |        D |      8 |   4100 |
|   3 | 12532 |        D |    8 |   4300 | (null) | (null) |   (null) | (null) | (null) |
|   4 | 12547 |        D |    8 |   4400 | (null) | (null) |   (null) | (null) | (null) |
|   5 | 12545 |        D |    9 |   4500 | (null) | (null) |   (null) | (null) | (null) |
|   6 | 12525 |        D |    9 |   4600 | (null) | (null) |   (null) | (null) | (null) |
|   7 | 12553 |        D |   12 |   4700 | (null) | (null) |   (null) | (null) | (null) |
|   8 | 12544 |        D |   11 |   4800 |      7 |  12553 |        D |     12 |   4700 |
|   9 | 12531 |        D |   17 |   5000 | (null) | (null) |   (null) | (null) | (null) |

Your final query, return rows without a match t2."Id" IS NULL.

SELECT t1.*
FROM Table1 t1
LEFT JOIN Table1 t2
       ON t1."FPPG" < t2."FPPG"
      AND t1."Salary" > t2."Salary"
WHERE 
      t2."Id" IS NULL

NOTE:

  • I remove AND t1."Id" <> t2."Id" because previous comparasion already validate have different values.
  • Make sure you include index for FPPG and Salary

Upvotes: 1

Related Questions