1--
1--

Reputation: 61

SQL query to get binary column based on existing column with group by function

I'm trying to use MySQL to Group based on two columns, choose the most recent date, and create two binary columns to records status. Here's an example table

_______________________________________________________________________
Letters | Numbers|         dates        |   score    |  random_status |
_______________________________________________________________________
A       |   2    | 2021-09-29 0:00:00   |    0.3     |   Sent         |
A       |   2    | 2021-10-01 0:00:00   |    1.4     |   Received     |
A       |   5    | 2021-10-04 0:00:00   |    0.8     |   Sent         |
A       |   7    | 2021-10-20 0:00:00   |    0.9     |   Sent         |
A       |   7    | 2021-10-20 0:20:00   |    0.5     |   Sent         |
R       |   7    | 2021-09-09 0:20:54   |    0.2     |   Sent         |
R       |   7    | 2021-10-14 0:00:00   |    2.5     |   Received     |
R       |   2    | 2021-10-07 0:00:00   |    0.7     |   Received     |
R       |   2    | 2021-09-14 0:00:00   |    1.7     |   Sent         |
C       |   5    | 2021-10-07 0:00:00   |    2.1     |   Sent         |
C       |   5    | 2021-10-25 0:00:00   |    3.5     |   Sent         |
C       |   7    | 2021-08-18 0:00:00   |    1.9     |   Sent         |
C       |   7    | 2021-08-29 0:00:00   |    0.6     |   Received     | 
C       |   2    | 2021-02-01 0:00:00   |    1.8     |   Sent         |

I want to group base on Letters and Numbers columns, and I want the latest date, with the latest score. I want to create two new columns based on the status column, that says if a letter and number combination was ever in sent or received status.

Something that looks like this:

Letters|    Numbers |   latest_date     |latest_score|  Has_sent| Has_received|
A      |       2    |2021-10-01 0:00:00 |     1.4    |     1    |      1      |
A      |       5    |2021-10-04 0:00:00 |     0.8    |     1    |      0      |
A      |       7    |2021-10-20 0:20:00 |     0.5    |     1    |      0      |
C      |       2    |2021-02-01 0:00:00 |     1.8    |     1    |      0      |
C      |       5    |2021-10-25 0:00:00 |     3.5    |     1    |      0      |
C      |       7    |2021-08-29 0:00:00 |     0.6    |     1    |      1      |
R      |       2    |2021-10-07 0:00:00 |     0.7    |     1    |      1      |
R      |       7    |2021-10-14 0:00:00 |     2.5    |     1    |      1      |

I used the following query

SELECT t1.Letters, t1.Numbers, MAX(t1.dates) as latest_date, t1.score as latest_score,
case when status = "Sent" then 1 else 0 end AS Has_sent,
case when status = "Received" then 1 else 0 end AS Has_received
FROM dummy_data t1
WHERE
t1.dates IN (SELECT MAX(t2.dates) FROM dummy_data t2
WHERE t1.Letters = t2.Letters AND t1.Numbers = t2.Numbers)
GROUP BY t1.Letters, t1.Numbers;

The last two columns, has_sent and has_reported, are not showing as expected. Instead I get it based on the max date. Is it doable to get it to be binary based on that status ever existed per Letter and Number combination?

Upvotes: 0

Views: 1181

Answers (2)

Ergest Basha
Ergest Basha

Reputation: 9018

Try:

select tbl1.Letters,
       tbl1.Numbers,
       tbl1.latest_date,
       tbl1.latest_score score, 
       tbl2.Has_sent,
       tbl2.Has_received
from    (
         select Letters, 
                Numbers, 
                max(dates) as `latest_date`, 
                score as `latest_score`
         from dummy_data
         where dates in ( select max(dates) 
                          from dummy_data 
                          group by Letters, Numbers )
         group by Letters, Numbers
         ) as tbl1 
     inner join 
     
        (
           select Letters, 
            Numbers ,
           max(case when random_status = "Sent" then 1 else 0 end) AS Has_sent,
           max(case when random_status = "Received" then 1 else 0 end) AS Has_received
           from dummy_data
           group by Letters, Numbers
        
        ) as tbl2 on tbl1.Letters=tbl2.Letters and tbl1.Numbers=tbl2.Numbers;

Demo: https://www.db-fiddle.com/f/usu3XK7Gn8gGqQnusmCiLk/4

Upvotes: 1

tarro
tarro

Reputation: 46

I've never used case, but from what I'm reading it appears row-wise? In that case, it won't work on both as the group by aggregates.

SELECT 
    t1.Letters, 
    t1.Numbers, 
    MAX(t1.dates)        AS latest_date, 
    LAST_VALUE(t1.score) AS latest_score, /* Since your rows appear to be chronological I assume this works */
    SELECT EXISTS(
        SELECT 1 
        FROM dummy_data 
        WHERE 
            Letters       = t1.Letters AND 
            random_status = "Sent"
    ) AS has_sent,
    SELECT EXISTS(
        SELECT 1 
        FROM dummy_data 
        WHERE 
            Letters       = t1.Letters AND 
            random_status = "Received"
    ) AS has_received
FROM dummy_data AS t1
GROUP BY t1.Letters, t1.Numbers
;

If your tables are large as many time-series are, those subqueries won't be sustainable. From the business logic I see in your example talbes:

  1. A Letter/Number pair must have a sent status.
  2. A Letter/Number pair can only have a received status if it has a historical sent status.
  3. A Letter/Number pair can have no more than one sent row and one received row. (unique constraint on Letters,Numbers,random_status)

Here is an alternative assuming those points.


SELECT 
    d_sent.Letters AS Letters,
    d_sent.Numbers AS Numbers,
    MAX(d_sent.dates, d_rcvd.dates) AS latest_date,
    (
        CASE
            WHEN isnull(d_rcvd.score) THEN d_sent.score
            ELSE d_rcvd.score
        END
    ) AS latest_score,
    1 AS is_sent,
    cast(isnull(d_rcvd.score) AS SIGNED INTEGER) AS is_received
FROM (
    SELECT * FROM dummy_data WHERE random_status="Sent"
) AS d_sent
LEFT JOIN (
    SELECT * FROM dummy_data WHERE random_status="Received"
) AS d_rcvd
ON
   d_sent.Letters = d_rcvd.Letters AND
   d_sent.Numbers = d_rcvd.Numbers
;

I'm sure there are some syntactical tweaks that must be made. Let me know how it goes.

Checking if left join is null

MySQL case function

Edit: It appears case is indeed row-wise.

Upvotes: 1

Related Questions