Reputation: 335

Understanding Unsafe code and its uses

I am currently reading the ECMA-334 as suggested by a friend that does programming for a living. I am on the section dealing with Unsafe code. Although, I am a bit confused by what they are talking about.

The garbage collector underlying C# might work by moving objects around in memory, but this motion is invisible to most C# developers. For developers who are generally content with automatic memory management but sometimes need fine-grained control or that extra bit of performance, C# provides the ability to write “unsafe” code. Such code can deal directly with pointer types and object addresses; however, C# requires the programmer to fix objects to temporarily prevent the garbage collector from moving them. This “unsafe” code feature is in fact a “safe” feature from the perspective of both developers and users. Unsafe code shall be clearly marked in the code with the modifier unsafe, so developers can't possibly use unsafe language features accidentally, and the compiler and the execution engine work together to ensure 26 8 9BLanguage overview that unsafe code cannot masquerade as safe code. These restrictions limit the use of unsafe code to situations in which the code is trusted.

The example

using System;
class Test
{
    static void WriteLocations(byte[] arr)
    {
        unsafe
        {
            fixed (byte* pArray = arr)
            {
                byte* pElem = pArray;
                for (int i = 0; i < arr.Length; i++)
                {
                    byte value = *pElem;
                    Console.WriteLine("arr[{0}] at 0x{1:X} is {2}",
                    i, (uint)pElem, value);
                    pElem++;
                }
            }
        }
    }
    static void Main()
    {
        byte[] arr = new byte[] { 1, 2, 3, 4, 5 };
        WriteLocations(arr);
        Console.ReadLine();
    }
}

shows an unsafe block in a method named WriteLocations that fixes an array instance and uses pointer manipulation to iterate over the elements. The index, value, and location of each array element are written to the console. One possible example of output is:
arr[0] at 0x8E0360 is 1
arr[1] at 0x8E0361 is 2
arr[2] at 0x8E0362 is 3
arr[3] at 0x8E0363 is 4
arr[4] at 0x8E0364 is 5
but, of course, the exact memory locations can be different in different executions of the application.

Why is knowing the exact memory locations of for example, this array beneficial to us as developers? And could someone explain this ideal in a simplified context?

Upvotes: 6

Answers (3)

Alex

Reputation: 13224

In general, the exact memory locations within an "unsafe" block are not so relevant.

As explained in Dai`s answer, when you are using Garbage Collector managed memory, you need to make sure that the data you are manipulating does not get moved (using "fixed"). You generally use this when

You are running a performance critical operation many times in a loop, and manipulating raw byte structures is sufficiently faster.
You are doing interop and have some non-standard data marshaling needs.

In a some cases, you are working with memory that is not managed by the Garbage Collector, some examples of such scenarios are:

When doing interop with unmanaged code, it can be used to prevent repeatedly marshaling data back and forth, and instead do some work in larger granularity chunks, using the "raw bytes", or structs mapped to these raw bytes.
When doing low level IO with large buffers that you need to share with the OS (e.g. for scatter/gather IO).
When creating specific structures in a memory mapped file. An example for instance could be a B+Tree with memory page sized nodes, that is stored in a disk based file that you want to page into memory.

Upvotes: 2

Dai

Reputation: 155270

The fixed language feature is not exactly "beneficial" as it is "absolutely necessary".

Ordinarily a C# user will imagine Reference-types as being equivalent to single-indirection pointers (e.g. for class Foo, this: Foo foo = new Foo(); is equivalent to this C++: Foo* foo = new Foo();.

In reality, references in C# are closer to double-indirection pointers, it's a pointer (or rather, a handle) to an entry in a massive object table that then stores the actual addresses of objects. The GC not only will clean-up unused objects, but also move objects around in memory to avoid memory fragmentation.

All this is well-and-good if you're exclusively using object references in C#. As soon as you use pointers then you've got problems because the GC could run at any point in time, even during tight-loop execution, and when the GC runs your program's execution is frozen (which is why the CLR and Java are not suitable for Hard Real Time applications - a GC pause can last a few hundred milliseconds in some cases).

...because of this inherent behaviour (where an object is moved during code execution) you need to prevent that object being moved, hence the fixed keyword, which instructs the GC not to move that object.

An example:

unsafe void Foo() {

    Byte[] safeArray = new Byte[ 50 ];
    safeArray[0] = 255;
    Byte* p = &safeArray[0];

    Console.WriteLine( "Array address: {0}", &safeArray );
    Console.WriteLine( "Pointer target: {0}", p );
    // These will both print "0x12340000".

    while( executeTightLoop() ) {
        Console.WriteLine( *p );
        // valid pointer dereferencing, will output "255".
    }

    // Pretend at this point that GC ran right here during execution. The safeArray object has been moved elsewhere in memory.

    Console.WriteLine( "Array address: {0}", &safeArray );
    Console.WriteLine( "Pointer target: {0}", p );
    // These two printed values will differ, demonstrating that p is invalid now.
    Console.WriteLine( *p )
    // the above code now prints garbage (if the memory has been reused by another allocation) or causes the program to crash (if it's in a memory page that has been released, an Access Violation)
}

So instead by applying fixed to the safeArray object, the pointer p will always be a valid pointer and not cause a crash or handle garbage data.

Side-note: An alternative to fixed is to use stackalloc, but that limits the object lifetime to the scope of your function.

Upvotes: 8

jaket

Reputation: 9341

One of the primary reasons I use fixed is for interfacing with native code. Suppose you have a native function with the following signature:

double cblas_ddot(int n, double* x, int incx, double* y, int incy);

You could write an interop wrapper like this:

public static extern double cblas_ddot(int n, [In] double[] x, int incx, 
                                       [In] double[] y, int incy);

And write C# code to call it like this:

double[] x = ...
double[] y = ...
cblas_dot(n, x, 1, y, 1);

But now suppose I wanted to operate on some data in the middle of my array say starting at x[2] and y[2]. There is no way to make the call without copying the array.

double[] x = ...
double[] y = ...
cblas_dot(n, x[2], 1, y[2], 1);
             ^^^^
             this wouldn't compile

In this case fixed comes to the rescue. We can change the signature of the interop and use fixed from the caller.

public unsafe static extern double cblas_ddot(int n, [In] double* x, int incx, 
                                              [In] double* y, int incy);

double[] x = ...
double[] y = ...
fixed (double* pX = x, pY = y)
{
    cblas_dot(n, pX + 2, 1, pY + 2, 1);
}

I've also used fixed in rare cases where I need fast loops over arrays and needed to ensure the .NET array bounds checking was not happening.

Upvotes: 2

Understanding Unsafe code and its uses

Answers (3)

Related Questions