Ashwin Jacob
Ashwin Jacob

Reputation: 317

How to summarize historical data that has blanks in SQL?

I have some historical data from various sources that I want to compile down to a line graph.

Here is a sample SQL fiddle that I am working with https://www.db-fiddle.com/f/oprEgt1gEz55tY5xumU6Jz/1

However, there are various case scenarios I need to handle. In the below scenarios, I always have 3 sources(A, B, C). B and C are always bringing in values but A is the run that runs into issues.

The user asked for data from January 1 to January 5.

  1. Source A was unable to bring data on January 3. My line graph should show me values for January 1, 2, 4, and 5. It skips the 3 because no value for source A was present.
  2. Source A didn't provide any updates after January 3 due to a missing connection. My line graph should be showing values from January 1 and 2. No 3, 4, or 5 because we are missing values from Source A.
  3. Source A shut down after January 3. My line graph should be showing ALL values from January 1 to 5. Even though Source A shut down, we still want to see all the values.
  4. Source A comes up as a new data source after January 3. My line graph should be showing ALL values from January 1 to 5.
  5. Source A was only able to provide updates from January 3 onwards (even though they should have given us values on January 1 onwards). My line graph should be showing only values from January 3 to 5.

I am trying to do this in MySQL. I got a simple query using GROUP BY and HAVING however that is not enough to satisfy the use cases above. I was about writing a CTE with this or using PARTITION BY but once again stuck with not being able to satisfy all the conditions.

Upvotes: 0

Views: 67

Answers (1)

Sathiya Kumar V M
Sathiya Kumar V M

Reputation: 515

Why don't you skip the HAVING so that you will get all the data for Jan 3rd as well though there is no data in Source A on Jan 2.

SELECT sample_date, COUNT(source_data_id) AS NUMBER_OF_SOURCES, 
SUM(value) AS TOTAL_VALUE
FROM sample_history_data
WHERE SOURCE_DATA_ID IN ('A','B','C')
GROUP BY sample_date
ORDER BY sample_date ASC;
sample_date NUMBER_OF_SOURCES   TOTAL_VALUE
2001-01-01  3   30
2001-01-02  3   30
2001-01-03  2   20
2001-01-04  3   30
2001-01-05  3   30

Upvotes: 0

Related Questions