Wermerb
Wermerb

Reputation: 1049

Select data between 2 defined rows

I have a H2 database. I want to count an average fuel usage regarding the data what i have been provided. The problem is that what i got is really messy. This is 1 car's fuel usage data.

This is some sample data:

| Amount   | Date       | Start (km) | End (km) |
+----------+------------+------------+----------+
| 35.5     | 2012-02-02 | 65000      | null     |
| 36.7     | 2012-02-15 | null       | 66520    |
| 44.5     | 2012-02-18 | null       | null     |
| 33.8     | 2012-02-22 | 67000      | null     |
| 44.5     | 2013-01-22 | null       | null     |

To count the average fuel usage first I'm calculating the difference between the MIN(distance) and MAX(distance) to do so i have the following query:

SELECT 
   CASEWHEN((MAX(start)-MAX(end))>0, MAX(start), MAX(end)) 
    - 
   IFNULL(MIN(start),0) 
FROM fuel;

For the next step i would need to SUM(Amount), but how can i do that its only summing the rows between 67000 and 65000?

Any help is greatly appreciated.

Upvotes: 1

Views: 54

Answers (1)

JohnLBevan
JohnLBevan

Reputation: 24470

I'd approach it like this:

SELECT SUM([amount]) / SUM([end] - [start]) AS AverageFuelUsage
FROM [fuel]
WHERE [amount] IS NOT NULL
AND [start] IS NOT NULL
AND [end] IS NOT NULL

NB: this excludes a lot of data (in your sample data, all of it) - however that's important.

  • If you don't know the amount of fuel used for a journey, that doesn't mean no fuel was used, so defaulting to 0 is a bad idea; better to ignore this row and rely on complete data.
  • If you don't know the start or end reading, you don't know the distance; again you can't assume 0 so ignore this bad data.

If for all records you're missing at least one field you may be able to get away with the code below - but I don't devise it if you even 1% of your records have full data to work with.

SELECT AVG([amount]) / ( AVG([end]) - AVG([start]) ) AS AverageFuelUsage
FROM [fuel]

The idea here is if we assume that on a large data set the data averages out (i.e. most people travel similar distances, with starting and ending readings also tending towards some average) we can work off the averages for each. I'm not a statistician and would treat any result this gave with a lot of suspicion, but if may be the best you can get if you have only bad data to work with and need a result.

Update

Per discussion in comments, if you've recorded every journey and all readings are for the same vehicle you can find the first value with a [start], the last value with an [end], calculate the overall distance travelled in all those journeys, then sum all of the fuel used en-route.

--ideally date is unique
--if not this tries to work out the sequence of journeys based on start/end odometer readings
--if they're both null and fall on the same day as the final [end] reading, assumes the null reading journey was prior to the [end] one
declare @fuel table ([amount] float, [date] date, [start] int, [end] int)
insert @fuel
  values ( 35.5     , '2012-02-02' , 65000      , null     )
        ,( 36.7     , '2012-02-15' , null       , 66520    )
        ,( 44.5     , '2012-02-18' , null       , null     )
        ,( 33.8     , '2012-02-22' , 67000      , null     )
        ,( 44.5     , '2013-01-22' , null       , null     )

select j1.[start]
, jn.[end]
, sum(f.[amount]) [amount]
, sum(f.[amount]) / (jn.[end] - j1.[start]) LitresPerKm
, (jn.[end] - j1.[start]) / sum(f.[amount])  kmsPerLitre

from
(
    select top 1 [amount], [date], [start], [end]
    from @fuel
    where [start] is not null
    order by [start]
) j1 --first journey
cross join
(   
    select top 1 [amount], [date], [start], [end]
    from @fuel
    where [end] is not null
    order by [end] desc
) jn --last journey
inner join @fuel f
on f.[date] >= j1.[date]
and (f.[end] <= j1.[start] or f.[end] is null) --in case multiple journeys on the same day & this is before our first start
and f.[date] <= jn.[date] 
and (f.start <= jn.[end] or f.[start] is null) --in case multiple journeys on the same day & this is after our last end
group by j1.[start],jn.[end]

Upvotes: 1

Related Questions