User
User

Reputation: 3274

Strange Increment Behaviour in C#

Note: Please note that the code below is essentially non-sense, and just for illustration purposes.

Based on the fact that the right-hand side of an assignment must always be evaluated before it's value is assigned to the left-hand side variable, and that increment operations such as ++ and -- are always performed right after evaluation, I would not expect the following code to work:

string[] newArray1 = new[] {"1", "2", "3", "4"};
string[] newArray2 = new string[4];

int IndTmp = 0;

foreach (string TmpString in newArray1)
{
    newArray2[IndTmp] = newArray1[IndTmp++];
}

Rather, I would expect newArray1[0] to be assigned to newArray2[1], newArray1[1] to newArray[2] and so on up to the point of throwing a System.IndexOutOfBoundsException . Instead, and to my great surprise, the version that throws the exception is

string[] newArray1 = new[] {"1", "2", "3", "4"};
string[] newArray2 = new string[4];

int IndTmp = 0;

foreach (string TmpString in newArray1)
{
    newArray2[IndTmp++] = newArray1[IndTmp];
}

Since, in my understanding, the compiler first evaluates the RHS, assigns it to the LHS and only then increments this is to me an unexpected behaviour. Or is it really expected and I am clearly missing something?

Upvotes: 24

Views: 1352

Answers (6)

Eric Lippert
Eric Lippert

Reputation: 660088

It is instructive to see exactly where your error is:

the right-hand side of an assignment must always be evaluated before it's value is assigned to the left-hand side variable

Correct. Clearly the side effect of the assignment cannot happen until after the value being assigned has been computed.

increment operations such as ++ and -- are always performed right after evaluation

Almost correct. It is not clear what you mean by "evaluation" -- evaluation of what? The original value, the incremented value, or the value of the expression? The easiest way to think about it is that the original value is computed, then the incremented value, then the side effect happens. Then the final value is that one of the original or the incremented value is chosen, depending on whether the operator was prefix or postfix. But your basic premise is pretty good: that the side effect of the increment happens immediately after the final value is determined, and then the final value is produced.

You then seem to be concluding a falsehood from these two correct premises, namely, that the side effects of the left hand side are produced after the evaluation of the right hand side. But nothing in those two premises implies this conclusion! You've just pulled that conclusion out of thin air.

It would be more clear if you stated a third correct premise:

the storage location associated with the left-hand-side variable also must be known before the assignment takes place.

Clearly this is true. You need to know two things before an assignment can happen: what value is being assigned, and what memory location is being mutated. You can't figure those two things out at the same time; you have to figure out one of them first, and we figure out the one on the left hand side -- the variable -- first in C#. If figuring out where the storage is located causes a side effect then that side effect is produced before we figure out the second thing -- the value being assigned to the variable.

In short, in C# the order of evaluations in an assignment to a variable goes like this:

  • side effects of the left hand side happen and a variable is produced
  • side effects of the right hand side happen and a value is produced
  • the value is implicitly converted to the type of the left hand side, which may produce a third side effect
  • the side effect of the assignment -- the mutation of the variable to have the value of the correct type -- happens, and a value -- the value just assigned to the left hand side -- is produced.

Upvotes: 12

Lasse V. Karlsen
Lasse V. Karlsen

Reputation: 391336

This is well-defined in the C# language according to Eric Lippert and is easily explained.

  1. First left-order expression things that needs to be referenced and remembered is evaluated, and side-effects are taken into account
  2. Then right-order expression is done

Note: The actual execution of code might not be like this, the important thing to remember is that the compiler must create code that is equivalent to this

So what happens in the second piece of code is this:

  1. Left-hand side:
    1. newArray2 is evaluated and the result is remembered (ie. the reference to whatever array we want to store things in is remembered, in case side-effects later change it)
    2. IndTemp is evaluated and the result is remembered
    3. IndTemp is increased by 1
  2. Right-hand side:
    1. newArray1 is evaluated and the result is remembered
    2. IndTemp is evaluated and the result is remembered (but this is 1 here)
    3. The array item is retrieved by indexing into the array from step 2.1 at index from step 2.2
  3. Back to left-hand side
    1. The array item is stored by indexing into the array from step 1.1 at index from step 1.2

As you can see, the second time IndTemp is evaluated (RHS), the value has already been increased by 1, but this has no impact on the LHS since it is remembering that the value was 0 before increased.

In the first piece of code, the order is slightly different:

  1. Left-hand side:
    1. newArray2 is evaluated and the result is remembered
    2. IndTemp is evaluated and the result is remembered
  2. Right-hand side:
    1. newArray1 is evaluated and the result is remembered
    2. IndTemp is evaluated and the result is remembered (but this is 1 here)
    3. IndTemp is increased by 1
    4. The array item is retrieved by indexing into the array from step 2.1 at index from step 2.2
  3. Back to left-hand side
    1. The array item is stored by indexing into the array from step 1.1 at index from step 1.2

In this case, the increase of the variable at step 2.3 has no impact on the current loop iteration, and thus you will always copy from index N into index N, whereas in the second piece of code you will always copy from index N+1 into index N.

Eric has a blog entry titled Precedence vs order, redux that should be read.

Here is a piece of code that illustrates, I basically turned variables into properties of a class, and implemented a custom "array" collection, that all just dump to the console what is happening.

void Main()
{
    Console.WriteLine("first piece of code:");
    Context c = new Context();
    c.newArray2[c.IndTemp] = c.newArray1[c.IndTemp++];

    Console.WriteLine();

    Console.WriteLine("second piece of code:");
    c = new Context();
    c.newArray2[c.IndTemp++] = c.newArray1[c.IndTemp];
}

class Context
{
    private Collection _newArray1 = new Collection("newArray1");
    private Collection _newArray2 = new Collection("newArray2");
    private int _IndTemp;

    public Collection newArray1
    {
        get
        {
            Console.WriteLine("  reading newArray1");
            return _newArray1;
        }
    }

    public Collection newArray2
    {
        get
        {
            Console.WriteLine("  reading newArray2");
            return _newArray2;
        }
    }

    public int IndTemp
    {
        get
        {
            Console.WriteLine("  reading IndTemp (=" + _IndTemp + ")");
            return _IndTemp;
        }

        set
        {
            Console.WriteLine("  setting IndTemp to " + value);
            _IndTemp = value;
        }
    }
}

class Collection
{
    private string _name;

    public Collection(string name)
    {
        _name = name;
    }

    public int this[int index]
    {
        get
        {
            Console.WriteLine("  reading " + _name + "[" + index + "]");
            return 0;
        }

        set
        {
            Console.WriteLine("  writing " + _name + "[" + index + "]");
        }
    }
}

Output is:

first piece of code:
  reading newArray2
  reading IndTemp (=0)
  reading newArray1
  reading IndTemp (=0)
  setting IndTemp to 1
  reading newArray1[0]
  writing newArray2[0]

second piece of code:
  reading newArray2
  reading IndTemp (=0)
  setting IndTemp to 1
  reading newArray1
  reading IndTemp (=1)
  reading newArray1[1]
  writing newArray2[0]

Upvotes: 18

Steve Morgan
Steve Morgan

Reputation: 13091

ILDasm can be your best friend, sometimes ;-)

I compiled up both your methods and compared the resulting IL (assembly language).

The important detail is in the loop, unsurprisingly. Your first method compiles and runs like this:

Code         Description                  Stack
ldloc.1      Load ref to newArray2        newArray2
ldloc.2      Load value of IndTmp         newArray2,0
ldloc.0      Load ref to newArray1        newArray2,0,newArray1
ldloc.2      Load value of IndTmp         newArray2,0,newArray1,0
dup          Duplicate top of stack       newArray2,0,newArray1,0,0
ldc.i4.1     Load 1                       newArray2,0,newArray1,0,0,1
add          Add top 2 values on stack    newArray2,0,newArray1,0,1
stloc.2      Update IndTmp                newArray2,0,newArray1,0     <-- IndTmp is 1
ldelem.ref   Load array element           newArray2,0,"1"
stelem.ref   Store array element          <empty>                     
                                                  <-- newArray2[0] = "1"

This is repeated for each element in newArray1. The important point is that the location of the element in the source array has been pushed to the stack before IndTmp is incremented.

Compare this to the second method:

Code         Description                  Stack
ldloc.1      Load ref to newArray2        newArray2
ldloc.2      Load value of IndTmp         newArray2,0
dup          Duplicate top of stack       newArray2,0,0
ldc.i4.1     Load 1                       newArray2,0,0,1
add          Add top 2 values on stack    newArray2,0,1
stloc.2      Update IndTmp                newArray2,0     <-- IndTmp is 1
ldloc.0      Load ref to newArray1        newArray2,0,newArray1
ldloc.2      Load value of IndTmp         newArray2,0,newArray1,1
ldelem.ref   Load array element           newArray2,0,"2"
stelem.ref   Store array element          <empty>                     
                                                  <-- newArray2[0] = "2"

Here, IndTmp is incremented before the location of the element in the source array has been pushed to the stack, hence the difference in behaviour (and the subsequent exception).

For completeness, let's compare it with

newArray2[IndTmp] = newArray1[++IndTmp];

Code         Description                  Stack
ldloc.1      Load ref to newArray2        newArray2
ldloc.2      Load IndTmp                  newArray2,0
ldloc.0      Load ref to newArray1        newArray2,0,newArray1
ldloc.2      Load IndTmp                  newArray2,0,newArray1,0
ldc.i4.1     Load 1                       newArray2,0,newArray1,0,1
add          Add top 2 values on stack    newArray2,0,newArray1,1
dup          Duplicate top stack entry    newArray2,0,newArray1,1,1
stloc.2      Update IndTmp                newArray2,0,newArray1,1  <-- IndTmp is 1
ldelem.ref   Load array element           newArray2,0,"2"
stelem.ref   Store array element          <empty>                     
                                                  <-- newArray2[0] = "2"

Here, the result of the increment has been pushed to the stack (and becomes the array index) before IndTmp is updated.

In summary, it seems to be that the target of the assignment is evaluated first, followed by the source.

Thumbs up to the OP for a really thought provoking question!

Upvotes: 21

Jonny Dee
Jonny Dee

Reputation: 849

Obviously the assumption that the rhs is always evaluated before the lhs is wrong. If you look here http://msdn.microsoft.com/en-us/library/aa691315(v=VS.71).aspx it seems like in the case of indexer access the arguments of the indexer access expression, which is the lhs, are evaluated before the rhs.

in other words, first it is determined where to store the result of the rhs, only then the rhs is evaluated.

Upvotes: 4

Ed Swangren
Ed Swangren

Reputation: 124642

It throws an exception because you start indexing into newArray1 at index 1. Since you are iterating over each element in newArray1 the last assignment throws an exception because IndTmp is equal to newArray1.Length, i.e., one past the end of the array. You increment the index variable before it is ever used to extract an element from newArray1, which means you will crash and also miss the first element in newArray1.

Upvotes: 3

Petar Ivanov
Petar Ivanov

Reputation: 93030

newArray2[IndTmp] = newArray1[IndTmp++];

leads to first assinging and then incrementing the variable.

  1. newArray2[0] = newArray1[0]
  2. increment
  3. newArray2[1] = newArray1[1]
  4. increment

and so on.

The RHS ++ operator increments right away, but it returns the value before it was incremented. The value used to index in the array is the value returned by the RHS ++ operator, so the non incremented value.

What you describe (the exception thrown) will be a result of a LHS ++:

newArray2[IndTmp] = newArray1[++IndTmp]; //throws exception

Upvotes: 13

Related Questions