Lieven Cardoen
Lieven Cardoen

Reputation: 25959

Getting an excpetion when Marshalling Strings from C++ to C#

When marshalling a fixed size char from C to C#, everything seems to work fine. However, when I try to marshall a dynamically allocated string array from C to C#, I'm getting an exception.

In the first case I have a C struct with a

struct SyntaxTree {
    int nodeType;
    char TypeString[64];
    char DataTypeString[64];
    char *ValueString;
    int NodeStart;
    int NodeEnd;
    int TokenStart;
    int TokenEnd;
    int Errno;
    char ParameterBuffer[256];

    int NodeHandle;
    struct SyntaxTree* sibling; /* first sibling */
    struct SyntaxTree* child; /* first child */

    SyntaxTree(sn_Node* node) {
        nodeType = node->Type;
        strcpy(this->TypeString, "\0");
        strcpy(this->DataTypeString, "\0");
        /*strcpy(this->ValueString, "\0");*/
        strcpy(this->ParameterBuffer, "\0");
        sibling = 0;
        child = 0;
    }
};
typedef struct SyntaxTree SyntaxTree;

And in C#

namespace MISPLEditor
{
    using System;
    using System.Runtime.InteropServices;

    [StructLayout(LayoutKind.Sequential, CharSet = CharSet.Ansi)]
    public struct ManagedTree
    {
        /// <summary>
        /// The node type.
        /// </summary>
        [MarshalAs(UnmanagedType.U4)]
        public int NodeType;

        /// <summary>
        /// The type string.
        /// </summary>
        [MarshalAs(UnmanagedType.ByValTStr, SizeConst = 64)]
        public string TypeString;

        /// <summary>
        /// The data type string.
        /// </summary>
        [MarshalAs(UnmanagedType.ByValTStr, SizeConst = 64)]
        public string DataTypeString;

        /// <summary>
        /// The value string.
        /// </summary>
        [MarshalAs(UnmanagedType.LPStr]
        public string ValueString;

        /// <summary>
        /// The node start.
        /// </summary>
        [MarshalAs(UnmanagedType.U4)]
        public int NodeStart;

        /// <summary>
        /// The node end.
        /// </summary>
        [MarshalAs(UnmanagedType.U4)]
        public int NodeEnd;

        /// <summary>
        /// The token start.
        /// </summary>
        [MarshalAs(UnmanagedType.U4)]
        public int TokenStart;

        /// <summary>
        /// The token end.
        /// </summary>
        [MarshalAs(UnmanagedType.U4)]
        public int TokenEnd;

        /// <summary>
        /// The errno.
        /// </summary>
        [MarshalAs(UnmanagedType.U4)]
        public int Errno;

        /// <summary>
        /// The parameter buffer.
        /// </summary>
        [MarshalAs(UnmanagedType.ByValTStr, SizeConst = 256)]
        public string ParameterBuffer;

        /// <summary>
        /// The node handle.
        /// </summary>
        [MarshalAs(UnmanagedType.U4)]
        public int NodeHandle;

        /// <summary>
        /// The sibling.
        /// </summary>
        public IntPtr Sibling;

        /// <summary>
        /// The child.
        /// </summary>
        public IntPtr Child;
    }
}

The exception I'm getting is:

An unhandled exception of type 'System.AccessViolationException' occurred in mscorlib.dll

Additional information: Attempted to read or write protected memory. This is often an indication that other memory is corrupt.

I don't seem to get any additional information here. There's definitely a value in ValueString. I'm more a C# expert than a C expert, so I don't know whether the ValueString char is actually initialized in a good way.

case(SN__NOD_STRING):
  strcpy(TypeString, "STRING");
  char *PoolString;
  PoolString = sn_GetPoolString(&SyntaxTree->StringPool, 
      Node->Value.Constant.Value.StringHandle);
  ValueString = (char * ) CoTaskMemAlloc(strlen(PoolString) + 1);
  strcpy(ValueString, PoolString);
  break;

In C#:

var unmanagedPtr = UnmanagedMisplApi.sn_parse(
  builder, tableBuilder, sourcePrefix.Length, error, info, errorParams);
SyntaxTree root;

if (unmanagedPtr != IntPtr.Zero)
{
  /* Marshal unmanagedPtr into managed struct */
  var tree = (ManagedTree)Marshal.PtrToStructure(
    unmanagedPtr, typeof(ManagedTree));

Filling the C struct is done by a recursive function. Actually, first lex and yacc are used to pare a string of code. Then the parse tree is converted to the SyntaxTree struct. In this code, the sn_NodeToString triggers the code with the case(SN__NOD_STRING) from above. The code for the recursive function is:

SyntaxTree* BuildTree(sn_NodeHandle nodeHandle, sn_SyntaxTree* snSyntaxTree) {

    SyntaxTree *first, *second, *third;
    sn_Pool* pool = &snSyntaxTree->NodePool;
    first = 0; second = 0; third = 0;

    // ----------------------------------------------------------

    /* NOTE (+ further uses): this operation is inherently not 64-bit safe! */
    sn_Node* node = (sn_Node*) sn_GetPoolElement(
        &snSyntaxTree->NodePool, nodeHandle);

    if(!node) return (SyntaxTree*) NULL;

    SyntaxTree* tree = new SyntaxTree(node);
    tree->child = 0;
    tree->sibling = 0;

    /* sn_NodeToString triggers the code with the case(SN__NOD_STRING): */
    sn_NodeToString(node, snSyntaxTree, tree->TypeString, 
        tree->DataTypeString, tree->ValueString);
    tree->NodeStart = node->NodeSpan.NodeStart - SubtractCodeLength;
    tree->NodeEnd = node->NodeSpan.NodeEnd - SubtractCodeLength;
    tree->TokenStart = node->NodeSpan.TokenStart;
    tree->TokenEnd = node->NodeSpan.TokenEnd;
    tree->NodeHandle = nodeHandle;
    tree->Errno = node->Value.ErrNo;
    if(node->ErrorArgs)
        strcpy(tree->ParameterBuffer, node->ErrorArgs);

    int arity = sn_GetArity(node);
    switch(arity) {
        case 0:
            break;
        case 1:
            first = BuildTree(UNOP(node), snSyntaxTree);
            tree->child = first;
            break;
        case 2:
            first = BuildTree(LEFTOP(node), snSyntaxTree);
            second = BuildTree(RIGHTOP(node), snSyntaxTree);
            if(first) { /* Todo Why wouldn't there be no first? */
                first->sibling = second;
                tree->child = first;
            }
            break;
        case 3:
            first = BuildTree(FIRSTOP(node), snSyntaxTree);
            second = BuildTree(SECONDOP(node), snSyntaxTree);
            third = BuildTree(THIRDOP(node), snSyntaxTree);
            first->sibling = second;
            second->sibling = third;
            tree->child = first;
            break;
        case -1: /* Treated as variadic arity */
            if(sn_GetPoolElement(pool, UNOP(node)), snSyntaxTree)
                first = BuildTree(UNOP(node), snSyntaxTree);

            if(sn_GetPoolElement(pool, LEFTOP(node)), snSyntaxTree)
                first = BuildTree(LEFTOP(node), snSyntaxTree);

            if(sn_GetPoolElement(pool, FIRSTOP(node)), snSyntaxTree)
                first = BuildTree(FIRSTOP(node), snSyntaxTree);

            if(sn_GetPoolElement(pool, RIGHTOP(node)), snSyntaxTree)
                second = BuildTree(RIGHTOP(node), snSyntaxTree);

            if(sn_GetPoolElement(pool, SECONDOP(node)), snSyntaxTree)
                second = BuildTree(SECONDOP(node), snSyntaxTree);

            if(sn_GetPoolElement(pool, THIRDOP(node)), snSyntaxTree)
                third = BuildTree(THIRDOP(node), snSyntaxTree);

            if(first)
                first->sibling = second;
            if(second)
                second->sibling = third;
            if(first)
                tree->child = first;
            break;
        default:
            break;
    }

    return tree;
}

Upvotes: 0

Views: 251

Answers (1)

Hans Passant
Hans Passant

Reputation: 941317

The question is too poorly documented, it looks like a struct member but the C code certainly doesn't look like it is filling a struct. Some hint that it is actually C++ code, you can't pinvoke non-static C++ member functions, the this pointer won't be valid. And will trigger an AVE.

There is a nasty bug visible in the C code with high odds for it causing an AVE, although not normally immediately. You are forgetting that C strings are zero-terminated, you must allocate strlen()+1 bytes for the string. The sprintf() call corrupts the heap when it writes the 0. Heap corruption like this is extraordinary hard to debug since its side-effects can take a while to become noticeable, you only get a quick AVE when you're lucky.

Then there's a memory management problem, somebody is going to have to release the string buffer again. The pinvoke marshaller normally takes on that duty when it copies the struct. And will die with AVE when it calls CoTaskMemFree() to release the string buffer. The string buffer was not allocated with CoTaskMemAlloc(), the pinvoke marshaller or your C# code has no hope of calling the correct free() function.

That's all I can see from the posted info. Enough opportunities for AVE. Using C++/CLI to interop with this code instead is normally a good idea, you'll have half-decent odds for calling the correct free() implementation. Memory management problems disappear when you give the caller the opportunity to allocate the memory.

Upvotes: 3

Related Questions