Reputation: 63
I'm trying to optimize the following method by having it return only the data my program actually needs for its calculations. In the past the dataset hasn't been that large so I haven't needed to optimize this too much. More recently, my dataset grew from about 30k records to 700k records, so I'm trying to optimize this further.
public void readRawThresholdsInList(int inputtedESourceID, DateTime maxDateTimeVal, List<int> hashKey)
{
log.Info("Reading in raw thresholds from db");
using (FAI db= new FAI ())
{
rawThresholds = (from thr in db.Thresholds
where thr.CalculatedDate == maxDateTimeVal
where thr.ErrorSourceId == inputtedESourceID
where hashKey.Contains(thr.HashKey)
select thr).ToList();
}
log.Info("Read in " + rawThresholds.Count() + " threshold records for error source id: " + inputtedESourceID + ".");
}
I got the method to return about 200k rows from 700k by adding the hashKey.Contains(thr.HashKey), but I want to take this one step further by doing the contains on a combination of 2 fields instead of just 1. The catch is that this has to happen on the DB for it to improve my runtimes. There is already post processing that only acts on the rows the program will need.
I want to be able to give the method the following as an input and have the contains (or something similar) act on the new HashKeyHostId object:
public HashKeyHostId{public int hashKey; public int hostId;}
public void readRawThresholdsInList(int inputtedESourceID, DateTime maxDateTimeVal, List<HashKeyHostId> hashKeyHostIdPairs){
}
There's probably a very simple way of doing this that I'm not thinking of.
Edit:
In response to James's comment - I don't believe this would work. The hashkey variable would now be an object containing pairs of integers instead of just 1 integer. This alone would prevent you from doing a .contains because it's no longer a primitive. It'd be the "public HashKeyHostId{public int hashKey; public int hostId;}" object I posted above. The new requirement is that a hashkey/hostId combo have to match up to 1 record in the DB.
If you're suggesting I do what Janne said (give it a list of hash keys and a list of host ids), I'm fairly certain this would return all results where the hashkey belongs to any of the hostids in the second list (aka, any combination of the two lists). I need it to only return rows with the specified combinations, not combinations of anything in the two lists.
Edit2: Sample dataset: Hashkeys = 100,101,102,103 HostIds = 1,2,3,4,5,6,...,10000
I'd give it a list like
List<HashKeyHostId> data = new List<HashKeyHostId>()
{new HashKeyHostId(100,1),new HashKeyHostId(101,5)}
I believe the query Janne/James are suggesting would return records for any of those combinations (100,1; 100,5; 101,1; 101,5;). I need it to only return records for 100,1 and 101,5.
edit3: I tried doing a where hashkeyHostIdpair.Any(o=> o.hashkey==thr.HashKey && o.hostid==thr.HostId)", but that errored out with the same "Unable to create a constant value of type 'RTM_DB.HashKeyHostId'. Only primitive types are supported in this context." message. It doesnt look like you can do a .Any or a .contains on a list of non-primitive types. I even tried making my own .Any with a where clause and that threw the same exception. (where hashkeyHostIdpair.Where(o=>o.hostid==thr.HostId && o.hashkey==thr.HashKey).Count()>0))
edit4: Per Josh's suggestion I tried this:
rawThresholds=fai.Thresholds.Where(o=>o.CalculatedDate==maxDateTimeVal)
.Where(o=>o.ErrorSourceId==inputtedESourceID)
.Where(o=> hashkeyHostIdpair.Contains(new HashKeyHostId(){ hashkey=o.HashKey, hostid = o.HostId})).ToList();
but it errored out with {System.NotSupportedException: Unable to create a constant value of type 'RTM_DB.HashKeyHostId'. Only primitive types ('such as Int32, String, and Guid') are supported in this context
Upvotes: 0
Views: 179
Reputation: 2339
If you have something like
List<HashKeyHostId> data = new List<HashKeyHostId>() {
new HashKeyHostId { hashKey = 100, hostId = 1 },
new HashKeyHostId { hashKey = 101, hostId = 5 }
}
You can use it in a contains like this:
<somequery>.Where(x => data.Contains(new HashKeyHostId { hashKey = x.HashKey, hostId = x.HostId }))
Note the use of the object initializer syntax instead of a constructor.
This should get translated to SQL with each item in the list being appended to the WHERE clause like:
WHERE ([t0].[HashKey] = @p0 AND [t0].[HostId] = @p1) OR
([t0].[HashKey] = @p2 AND [t0].[HostId] = @p3) OR ...
Upvotes: 0
Reputation: 5121
Something along this would maybe be what you want to do?
public void readRawThresholdsInList(int inputtedESourceID, DateTime maxDateTimeVal, List<int> hashKeys, List<int> hostIds)
{
log.Info("Reading in raw thresholds from db");
using (var db = new FAI())
{
var rths = (from thr in db.Thresholds
where thr.CalculatedDate == maxDateTimeVal
&& thr.ErrorSourceId == inputtedESourceID
select thr);
if (hashKeys != null && hashKeys.Count() > 0)
rths = rths.Where(rth => hashKeys.Contains(rth.HashKey))
if (hostIds != null && hostIds.Count() > 0)
rths = rths.Where(rth => hostIds.Contains(rth.HostId)) // FieldName?
rawThresholds = rths.ToList();
}
log.Info("Read in " + rawThresholds.Count() + " threshold records for error source id: " + inputtedESourceID + ".");
}
-- edit --
You could do something like this, but I wouldnt recommend it. Figure out a value which you can multiply the HashKey safely so HostId will always be in the last digits
var filters = new int[] { 100 * 100 + 1 , 101 * 100 + 5 }; // 10001 & 10105
var rths = (from rth in db.Thresholds
where rth.CalculatedDate == maxDateTimeVal
&& rth.ErrorSourceId == inputtedESourceID
&& filters.Contains(rth.HashKey * 100 + rth.HostId)
select rth).ToList();
Upvotes: 0
Reputation: 82136
There's probably a very simple way of doing this that I'm not thinking of.
Yeah, there is
where hashKey.Contains(someValue) && hashKey.Contains(someOtherValue)
Upvotes: 1