Moskus
Moskus

Reputation: 79

Linq on generic List takes a lot of time

I have a database table that gets 7 different values from 6 different measuring stations (that's 42 values) every second. The data isn't stored with a timestamp, just a "TickNumber" and then the time is calculated from the TickNumber and the time when the measurement was started.

I have no control over this.

However, to speed things up, I download the data, analyze it and store it in another database with an asp.net frontend. This system works great and everybody's happy.

However, the analysis part is taking forever, and after spending some time with the Performance Analyzer I've found the problem.

This fetches the data and returns a List.

Public Shared Function GetMeasuredValues(ByVal _startdate As Date, ByVal _enddate As Date) As List(Of MeasuredValues)
    Dim _db As New Quickview

    Dim functions() As Integer = System.Enum.GetValues(GetType(Enums.MeasuredValueTypes))
    Dim total_values As New List(Of MeasuredValues)

    'Finding max and min row values
    Dim stations() As Integer = {1, 2, 3, 4, 6, 16}
    For Each i In stations
        Dim station As Integer = i
        Dim local_start As Integer = DB.DateToPeriodNo(station, _startdate)
        Dim local_end As Integer = DB.DateToPeriodNo(station, _enddate)

        If local_start > 0 Then
            Dim all_values = (From vls In _db.MeasuredValues
                              Where vls.MeasValueId = station _
                              And functions.Contains(vls.FuncId) _
                              And vls.PeriodNo >= local_start And vls.PeriodNo <= local_end _
                              ).ToList
            Console.WriteLine("Data count for station " & i & ": " & all_values.Count)
            total_values.AddRange(all_values)
        End If
    Next

    Dim sorted_values = (From vls In total_values
                         Order By vls.Time Ascending, vls.MeasValueId Ascending).ToList

    Return sorted_values
End Function

This works OK. There's a lot of data, and transferring the data is taking up most of the consumed time at this step.

This data is then filtered to give me values from one hour (07:00 to 07:59, etc). I use those values to calculate the averages and sums needed for that hour. Sadly much of this is logarithmic so I can't use. Sum, etc.

Then I do:

Dim all_values = DB.GetMeasuredValues(date_start, date_end)

.... which just gives me a list of all the values I need.

Here's the problem, this query seems to take forever.

''' [in for-loop going through each hour between date_start and date_end, typically 24 hours]
Dim values_hour = (From vls In all_values
                   Where vls.MeasValueId = station _
                   And vls.FuncId = Func _
                   And vls.Time >= time_start And vls.Time < time_end).ToList

If I am to trust the Performance Analyzer, this simple query takes 97% of the resources. My calculations don't seem to have any impact (<0.2%) at all.

I'm sure I'm doing something wrong, but what?

Upvotes: 1

Views: 147

Answers (2)

Gert Arnold
Gert Arnold

Reputation: 109079

As you have the data sorted by time you could use SkipWhile/TakeWhile to get a time chunk out of them and then apply the other filters. Thus you enumerate the bulk data once to get the required times, and only apply the filters to this subset of the data:

Dim slice = all_values _
            .SkipWhile(Function(vls) vls.Time < time_start) _
            .TakeWhile(Function(vls) vls.Time < time_end)

and then filter by Func and station.

Upvotes: 2

Matt Ko
Matt Ko

Reputation: 979

The call to ToList is what will cause a lot of overhead here because it needs to allocate the memory for the list every time, then fill that list.
Have you tried removing that?
On top of that, I would use total_values.Concat(all_values) instead of total_values.AddRange(all_values) and then just call the ToList at the very end like you already do.

And I would only write the data count if in debug mode or something similar so that you don't loose time there either when performance is important.

Upvotes: 2

Related Questions