Reputation: 1049
I have a H2 database. I want to count an average fuel usage regarding the data what i have been provided. The problem is that what i got is really messy. This is 1 car's fuel usage data.
This is some sample data:
| Amount | Date | Start (km) | End (km) |
+----------+------------+------------+----------+
| 35.5 | 2012-02-02 | 65000 | null |
| 36.7 | 2012-02-15 | null | 66520 |
| 44.5 | 2012-02-18 | null | null |
| 33.8 | 2012-02-22 | 67000 | null |
| 44.5 | 2013-01-22 | null | null |
To count the average fuel usage first I'm calculating the difference between the MIN(distance) and MAX(distance) to do so i have the following query:
SELECT
CASEWHEN((MAX(start)-MAX(end))>0, MAX(start), MAX(end))
-
IFNULL(MIN(start),0)
FROM fuel;
For the next step i would need to SUM(Amount)
, but how can i do that its only summing the rows between 67000 and 65000?
Any help is greatly appreciated.
Upvotes: 1
Views: 54
Reputation: 24470
I'd approach it like this:
SELECT SUM([amount]) / SUM([end] - [start]) AS AverageFuelUsage
FROM [fuel]
WHERE [amount] IS NOT NULL
AND [start] IS NOT NULL
AND [end] IS NOT NULL
NB: this excludes a lot of data (in your sample data, all of it) - however that's important.
If for all records you're missing at least one field you may be able to get away with the code below - but I don't devise it if you even 1% of your records have full data to work with.
SELECT AVG([amount]) / ( AVG([end]) - AVG([start]) ) AS AverageFuelUsage
FROM [fuel]
The idea here is if we assume that on a large data set the data averages out (i.e. most people travel similar distances, with starting and ending readings also tending towards some average) we can work off the averages for each. I'm not a statistician and would treat any result this gave with a lot of suspicion, but if may be the best you can get if you have only bad data to work with and need a result.
Update
Per discussion in comments, if you've recorded every journey and all readings are for the same vehicle you can find the first value with a [start], the last value with an [end], calculate the overall distance travelled in all those journeys, then sum all of the fuel used en-route.
--ideally date is unique
--if not this tries to work out the sequence of journeys based on start/end odometer readings
--if they're both null and fall on the same day as the final [end] reading, assumes the null reading journey was prior to the [end] one
declare @fuel table ([amount] float, [date] date, [start] int, [end] int)
insert @fuel
values ( 35.5 , '2012-02-02' , 65000 , null )
,( 36.7 , '2012-02-15' , null , 66520 )
,( 44.5 , '2012-02-18' , null , null )
,( 33.8 , '2012-02-22' , 67000 , null )
,( 44.5 , '2013-01-22' , null , null )
select j1.[start]
, jn.[end]
, sum(f.[amount]) [amount]
, sum(f.[amount]) / (jn.[end] - j1.[start]) LitresPerKm
, (jn.[end] - j1.[start]) / sum(f.[amount]) kmsPerLitre
from
(
select top 1 [amount], [date], [start], [end]
from @fuel
where [start] is not null
order by [start]
) j1 --first journey
cross join
(
select top 1 [amount], [date], [start], [end]
from @fuel
where [end] is not null
order by [end] desc
) jn --last journey
inner join @fuel f
on f.[date] >= j1.[date]
and (f.[end] <= j1.[start] or f.[end] is null) --in case multiple journeys on the same day & this is before our first start
and f.[date] <= jn.[date]
and (f.start <= jn.[end] or f.[start] is null) --in case multiple journeys on the same day & this is after our last end
group by j1.[start],jn.[end]
Upvotes: 1