Reputation: 69
I have a table with cell viability data containing both numeric data and images (dose-response curves). The table can contain multiple rows for the same compound (uniqued by Batch ID). each row has a unique ID, as well as a date field. Now I want to group the data by Batch ID and produce the average EC50 values, but show this together with the last dose-response Curve generated for each compound Batch_ID. The code below will select the first Curve encountered for a particular compound Batch_ID. How can I select the last one instead, but still show it together with average EC50? Any tips appreciated!
SELECT Batch_ID, avg(EC50), Curve FROM CELL_VIABILITY GROUP BY Batch_ID
Example data:
ID Batch_ID EC50 Curve Date
1 ABC123 6.72 blob_1 18-06-15
2 ABC123 4.74 blob_2 18-07-10
3 ABC123 8.72 blob_3 18-08-22
4 DEF456 1.95 blob_4 18-06-15
5 DEF456 1.66 blob_5 18-07-10
6 DEF456 3.06 blob_6 18-08-22
Expected outcome:
Batch_ID EC50 Curve
ABC123 6.73 blob_3
DEF456 2.22 blob_6
Upvotes: 1
Views: 56
Reputation: 28834
Remember that data is unordered set. Without defining a specific order, we cannot determine what is last, first etc.
We can use the Date
column to define the Order. Latest updated entry (Maximum Date value) can be considered as "Last".
We can then use Correlated Subquery to determine the Last Curve for a specific Batch_ID
.
SELECT cv1.Batch_ID,
AVG(cv1.EC50),
MAX((SELECT cv2.Curve
FROM CELL_VIABILITY AS cv2
WHERE cv2.Batch_ID = cv1.Batch_ID
ORDER BY cv2.Date DESC LIMIT 1)) AS Last_Curve
FROM CELL_VIABILITY AS cv1
GROUP BY Batch_ID
Another approach would be using a Derived Table. We can get the last Date
value for every Batch_ID
. Then "Self-Join" to the table using the maximum value of the Date to get the Last Curve
:
SELECT
cv1.Batch_ID,
cv1.average,
cv2.Curve
FROM
(
SELECT Batch_ID,
AVG(EC50) AS average,
MAX(Date) AS last_date
FROM CELL_VIABILITY
GROUP BY Batch_ID
) AS cv1
JOIN CELL_VIABILITY AS cv2
ON cv2.Batch_ID = cv1.Batch_ID AND
cv2.Date = cv1.last_date
Upvotes: 1