Alan Baljeu
Alan Baljeu

Reputation: 2443

LINQ query becomes invalid upon editing result data?

(Please see the answer I wrote for more understanding of the situation.)

Below is a query which works great, operating on selected rows of table STUDENTS. Then one edit destroys the query variable. What's wrong here?

students is rows selected from an import Datatable defined in part by:

    importTable.Columns.Add("SECTION", typeof(string));
    importTable.Columns.Add("NUMBER", typeof(string));
    importTable.Columns.Add("ID", typeof(string));

(Because the DataTable is untyped, I need to cast the data into string to use the fields).

Then called by:

IEnumerable<DataRow> s = importTable.AsEnumerable();
IEnumerable<DataRow> t = s
  .OrderBy(r => r["HALL"]);
IEnumerable<DataRow> sortedTable = t
  .OrderBy(r => 
          { //if (r["ID"] is DBNull)
            //  return "";
            //else
              return r["ID"]; // ERROR
          });

IEnumerable<DataRow> tue = sortedTable.Where(r => r["DAY"].Equals("TUE"));
IEnumerable<DataRow> wed = sortedTable.Where(r => r["DAY"].Equals("WED"));
AssignSections(tue);
AssignSections(wed);

Here is the query:

public void AssignSections(IEnumerable<DataRow> students)
{
  IEnumerable<IEnumerable<DataRow>> query = from e in students.AsEnumerable()
              orderby (e["SHORTSCHOOL"]  as string).PadRight(30) + e["SEED"] as string
              group e by new { DAY=e["DAY"], GRADE=e["GRADE"] } into g
              orderby g.Key.GRADE as string
              select g.AsEnumerable();

  var queryList = query.ToList(); // ArgumentException during "WED" call

  foreach (var grade in query)
    foreach (var student in grade)
      if (student["ID"] == DBNull.Value)
      {
        student["SECTION"] = "S";
        student["ID"] = "ID1";
      }
}

Assigning SECTION works, NO PROBLEM. Assigning ID causes query to look like: Query after assignment

query now appears invalid. Future uses of query also prove to be invalid (though the foreach finishes fine). For what it's worth, grade is just fine, but students is also invalidated through the original table seems to be fine as well.

Upvotes: 2

Views: 244

Answers (3)

Ivan Stoev
Ivan Stoev

Reputation: 205629

No magic here. It's a combination of LINQ query Deferred Execution and the usage of the DBNull, which cannot be compared to/from other types.

The deferred execution has been explained many times, so I'm not going to spend time on it. Shorty, the query is executed the only (but anytime) when enumerated. Enumerating means foreach, ToList etc. and technically speaking happens when the GetEnumerator of the enumerable (or the first MoveNext of the enumerator) is called.

All you need to remember from the above is that the IEnumerable<T> (or IQueryable<T>) returning LINQ queries are not executed (evaluated) at the time you define them, but every time you enumerate them (directly or indirectly). This should explain the "The answer surprisingly to me is that LINQ reorders code" part from your own answer. No, LINQ does not reorder the code, it's your code which is doing that by reevaluting the LINQ queries at certain points which are different from the place where you define your query variables. If you want to evaluate them just once at specific point, then do that by adding ToList, ToArray and similar methods which enumerate the query and store the result in some in memory collection and use that collection for further processing. It still be IEnumerable<T>, but further enumerations would enumerate the query result rather than reevaluate the query.

The main issue is the DBNull. From your explanations looks like initially all the ID values are DBNull, so the first query runs fine (DBNull knows how to compare to itself :). Once the source contains at least one value which is not DBNull, any further query that uses OrderBy that column with the default IComparer will fail.

It can easily be reproduced w/o data tables with the following simple code:

var data = new[]
{
    new { Id = (object)DBNull.Value },
    new { Id = (object)DBNull.Value }
};
var query = data.OrderBy(e => e.Id);

query.ToList(); // Success

data[1] = new { Id = (object)"whatever" };
query.ToList(); // Fail

showing the deferred query execution and reevaluation, or directly (to prove that the problem is not with editing):

new[]
{
    new { Id = (object)DBNull.Value },
    new { Id = (object)"whatever" }
}
.OrderBy(e => e.Id)
.ToList(); // Fail

The solution is to avoid DBNull at all. The easiest (and much better than as string or ToString()) with DataTable is to use DataRowExtensions.Field extension methods instead of object returning indexer, which besides providing strongly typed access to the columns also automatically handle DBNulls for you (converts them to null when you request string or nullable type), so you won't experience such issues.

It can be proved by changing your problematic code to

.OrderBy(r => r.Field<string>("ID"))

and the problem will be gone. I strongly recommend doing that for other column accessors as well.

Upvotes: 4

George Vovos
George Vovos

Reputation: 7618

DBNull and null is not the same...
As your original error message says "Object must be of type string" (to be assigned to a string)
DBNull can't be cast to a string,it is a class...
You need to handle this case in your code. See this link for a simple helper method:
Unable to cast object of type 'System.DBNull' to type 'System.String

using System;
namespace ConsoleApp1
{
    class Program
    {
        static void Main(string[] args)
        {

            DBNull dbNull = DBNull.Value;
            Console.WriteLine(typeof(string).IsAssignableFrom(typeof(DBNull)));//False
            Console.WriteLine(dbNull is string); //False

            //Console.WriteLine((string)dbNull);  // compile time error
            //Console.WriteLine(dbNull as string); // compile time error

            Console.ReadLine();
        }
    }
}

Also, make sure you read how "Lazy Loading"/"Deferred Execution" works on LINQ/IEnumerable.
You don't have to use IEnumerable all the time,especially if you are not sure how it works.

Upvotes: 0

Alan Baljeu
Alan Baljeu

Reputation: 2443

The answer surprisingly to me is that LINQ reorders code. The context was this:

  IEnumerable<DataRow> s = importTable.AsEnumerable();
  IEnumerable<DataRow> t = s
      .OrderBy(r => r["HALL"]);
  IEnumerable<DataRow> sortedTable = t
      .OrderBy(r => 
              { //if (r["ID"] is DBNull)
                //  return "";
                //else
                  return r["ID"]; // ERROR
              });

  IEnumerable<DataRow> tue = sortedTable.Where(r => r["DAY"].Equals("TUE"));
  IEnumerable<DataRow> wed = sortedTable.Where(r => r["DAY"].Equals("WED"));
  AssignSections(tue);
  AssignSections(wed);

The 3 commented lines indicate the fault. And what happened: sortedTable was partially initialized in order to feed the Where clause for initializing tue. But then the sortedTable was completed to initialize wed AFTER the call to assign wed appeared in the code, but just in time to use wed in the query constructed in AssignSections!

So the ERROR arose during AssignSections, when the code detoured to completing the initializing of sortedTable, and I could detect this by adding the 3 disabled lines and setting a breakpoint on the "return "";

Magic?

Upvotes: 0

Related Questions