Reputation: 4898
I'm not good at databases and T-sql queries so I got a little stumped over how to do something similar in C# using Linq.
Thing is I have this structure that is pretty much the same as a relational database table, with which I have to do some kind of join selection.
In effect I get a list of composite key addresses. These are actually classes that hold a few int values (byte or short perhaps but not relevant). Now I have to search through my structure for matches of these lists and call a method there.
This is probably a simple join (can't remember what join does what) but I need some help because I wan't this to be as cheap as I can easily get away with so I don't need to search through every line for every address.
public class TheLocationThing
{
int ColumnID;
int ColumnGroupId;
int RowID;
}
public class TheCellThing
{
TheLocationThing thing;
public void MethodINeedToCallIfInList()
{
//here something happens
}
}
public class TheRowThing
{
int RowId;
List<TheCellThing> CellsInThisRow;
}
public class TableThing
{
List<TheRowThing> RowsInThisTable;
}
So I have this tablething type class that has rows and they have cells. Notice that ColumnGroup thing, it's a composite key with ColumnId so the same columnid can come again but only once for each ColumnGroup.
The thing to keep in mind though is that inn TheTable there will only ever be one GroupColumnId, but the list given could have multiple, so we can filter them away.
public void DoThisThing()
{
List<TheLocationThing> TheAddressesINeedToFind = GetTheseAddresses(); //actualy is a TheLocationThing[] if that matters
var filterList = TheAddressesINeedToFind.Where(a => a.ColumnGroupId == this.CurrentActiveGroup);
//Here I have to do the join with this.TableInstance
}
Now, I should of course only loop through the addresses with the same row id in that row and all that.
Also is managing thing as IQueryable something that would help me out here, especially in the initial filter out, should i get it as Queryable?
Upvotes: 0
Views: 919
Reputation: 113262
I'm going to give different example, because I'm not quite following yours, and use it to explain the basics of joining, hopefully hitting what you need to learn.
Let's imagine two slightly more meaninfully-named classes than LocationThing etc. (which has me lost).
public class Language
{
string Code{get;set;}
string EnglishName{get;set;}
string NativeName{get;set;}
}
public class Document
{
public int ID{get; private set;}//no public set as it corresponds to an automatically-set column
public string LanguageCode{get;set;}
public string Title{get;set;}
public string Text{get;set;}
}
Now, let's also imagine we have methods GetLanguages()
and GetDocuments()
that return all languages and documents respectively. There's a few different ways that could work, and I'll get to that later.
An example of a join being useful, is if we e.g. wanted all the titles and all the English names of the languages they were in. For that in SQL we would use:
SELECT documents.title, languages.englishName
FROM languages JOIN documents
ON languages.code = documents.languageCode
Or leaving out table names where doing so doesn't make column-names ambiguous:
SELECT title, englishName
FROM languages JOIN documents
ON code = languageCode
Each of these will, for each row in documents, match them up with the corresponding row in languages, and return the title and English name of the combined row (if there's a document with no matching language, it doesn't get returned, if there's two languages with the same code - should be prevented by the db in this case though - corresponding documents get mentioned once for each).
The LINQ equivalent is:
from l in GetLanguages()
join d in GetDocuments()
on l.Code equals d.LanguageCode //note l must come before d
select new{d.Title, l.EnglishName}
This will similarly match each document with its corresponding language and return an IQueryable<T>
or IEnumerable<T>
(depending on the source enumerations/queryables) where T
is an anonymous object with Title
and EnglishName
properties.
Now, as to the expense of this. This depends primarily on the nature of GetLanguages()
and GetDocuments()
.
No matter what the source, this is inherently a matter of searching through every one of the results of those two methods - that's just the nature of the operation. However, the most efficient way of doing this, is still something that varies according to what we know about the source data. Let's consider a Linq2Objects form first. There's lots of ways that this could be done, but lets imagine they're returning List
s that were pre-computed:
public List<Document> GetDocuments()
{
return _precomputedDocs;
}
public List<Language> GetLanguages()
{
return _precomputedLangs;
}
Let's pretend Linq's join
doesn't exist for a moment, and imagine how we'd write something functionally equivalent to the code above. We might arrive at something like:
var langLookup = GetLanguages().ToLookup(l => l.Code);
foreach(var doc in GetDocuments())
foreach(var lang in langLookup[doc.LanguageCode])
yield return new{doc.Title, lang.EnglishName};
This is a reasonable general case. We can go one step further, and reduce storage, since we know that all we finally care about with each language is the English name:
var langLookup = GetLanguages().ToLookup(l => l.Code, l => l.EnglishName);
foreach(var doc in GetDocuments())
foreach(var englishName in langLookup[doc.LanguageCode])
yield return new{doc.Title, EnglishName = englishName};
That's about as much as we can do without special knowledge of the set of data.
If we did have special knowledge, we could go further. For example, if we knew there was only one language per code, then the following would be faster:
var langLookup = GetLanguages().ToDictionary(l => l.Code, l => l.EnglishName);
string englishName;
foreach(var doc in GetDocuments())
if(langLookup.TryGetValue(doc.LanguageCode, out englishName))
yield return new{doc.Title, EnglishName = englishName};
If we knew the two sources were both sorted by language code, we could go further still and spin through them both at the same time, yielding matches, and throwing away languages once we've dealt with them, as we're never going to need it again for the rest of the enumeration.
But, Linq does not have that special knowledge when just looking at two lists. For all it knows every single language and every single document all have the same codes. It really has to examine the lot to find out. For that, it's pretty efficient in how it does it (a bit better than my example above suggests, due to some optimisation).
Let's consider a Linq2SQL case, and note that Entity Framework and other ways of using Linq directly on databases would be comparable. Let's say all of this is happening in the context of a class that has a _ctx
member that's a DataContext
. Then our source methods could be:
public Table<Document> GetDocuments()
{
return _ctx.GetTable<Document>();
}
public Table<Language> GetLanguages()
{
return _ctx.GetTable<Languages>();
}
Table<T>
implements IQueryable<T>
along with some other methods. Here, instead of joining things in memory, it'll execute the following (bar some aliases) SQL:
SELECT documents.title, languages.englishName
FROM languages JOIN documents
ON languages.code = documents.languageCode
Look familiar? It's the same SQL we mentioned at the beginning.
First great thing about this, is that it's not bringing back anything from the database that we won't use.
Second great thing, is that the database's query engine (what turns this into executable code that it then runs) does have knowledge of the nature of the data. If for example we've set up the Languages
table to have a unique key or constraint on the code
column, the engine knows there can't be two languages with the same code, so it can perform the equivalent of the optimisation we mentioned above where we used a Dictionary
instead of a ILookup
.
Third great thing, is that if we have indices on languages.code
and documents.languageCode
then the query engine will use these for even faster retrieval and matching, perhaps getting all it needs from the index without hitting the table, making a call as to which table to hit first to avoid testing irrelevant rows in the second, and so on.
Fourth great thing, is that RDBMSs have benefited from several decades of research into how to make this sort of retrieval as fast as possible, so we've stuff going on that I don't know about and don't need to know about to benefit from.
In all then, we want to run our queries against the datasource directly, not against sources in memory. There are exceptions, particularly some forms of grouping (hitting the DB directly with some group-by operations can mean hitting it repeatedly) and if we reuse the same results over and over in quick succession (in which case we're better off hitting it once for those results, and then storing them).
Upvotes: 2