Evert Homan
Evert Homan

Reputation: 69

mySQL group numeric data together with last image from same table

I have a table with cell viability data containing both numeric data and images (dose-response curves). The table can contain multiple rows for the same compound (uniqued by Batch ID). each row has a unique ID, as well as a date field. Now I want to group the data by Batch ID and produce the average EC50 values, but show this together with the last dose-response Curve generated for each compound Batch_ID. The code below will select the first Curve encountered for a particular compound Batch_ID. How can I select the last one instead, but still show it together with average EC50? Any tips appreciated!

SELECT Batch_ID, avg(EC50), Curve FROM CELL_VIABILITY GROUP BY Batch_ID

Example data:

ID  Batch_ID    EC50    Curve   Date
1   ABC123  6.72    blob_1  18-06-15
2   ABC123  4.74    blob_2  18-07-10
3   ABC123  8.72    blob_3  18-08-22
4   DEF456  1.95    blob_4  18-06-15
5   DEF456  1.66    blob_5  18-07-10
6   DEF456  3.06    blob_6  18-08-22

Expected outcome:

Batch_ID    EC50    Curve
ABC123     6.73    blob_3
DEF456     2.22    blob_6

Upvotes: 1

Views: 56

Answers (1)

Madhur Bhaiya
Madhur Bhaiya

Reputation: 28834

Remember that data is unordered set. Without defining a specific order, we cannot determine what is last, first etc.

We can use the Date column to define the Order. Latest updated entry (Maximum Date value) can be considered as "Last".

We can then use Correlated Subquery to determine the Last Curve for a specific Batch_ID.

SELECT cv1.Batch_ID, 
       AVG(cv1.EC50), 
       MAX((SELECT cv2.Curve 
           FROM CELL_VIABILITY AS cv2 
           WHERE cv2.Batch_ID = cv1.Batch_ID
           ORDER BY cv2.Date DESC LIMIT 1)) AS Last_Curve
FROM CELL_VIABILITY AS cv1
GROUP BY Batch_ID

Another approach would be using a Derived Table. We can get the last Date value for every Batch_ID. Then "Self-Join" to the table using the maximum value of the Date to get the Last Curve:

SELECT 
  cv1.Batch_ID, 
  cv1.average, 
  cv2.Curve
FROM 
(
 SELECT Batch_ID, 
        AVG(EC50) AS average, 
        MAX(Date) AS last_date 
 FROM CELL_VIABILITY
 GROUP BY Batch_ID
) AS cv1 
JOIN CELL_VIABILITY AS cv2 
  ON cv2.Batch_ID = cv1.Batch_ID AND 
     cv2.Date = cv1.last_date

Upvotes: 1

Related Questions