Aidiakapi
Aidiakapi

Reputation: 6249

Unsafe string creation from char[]

I'm working on a high performance code in which this construct is part of the performance critical section.

This is what happens in some section:

  1. A string is 'scanned' and metadata is stored efficiently.
  2. Based upon this metadata chunks of the main string are separated into a char[][].
  3. That char[][] should be transferred into a string[].

Now, I know you can just call new string(char[]) but then the result would have to be copied.

To avoid this extra copy step from happening I guess it must be possible to write directly to the string's internal buffer. Even though this would be an unsafe operation (and I know this bring lots of implications like overflow, forward compatibility).

I've seen several ways of achieving this, but none I'm really satisfied with.

Does anyone have true suggestions as to how to achieve this?

Extra information:
The actual process doesn't include converting to char[] necessarily, it's practically a 'multi-substring' operation. Like 3 indexes and their lengths appended.

The StringBuilder has too much overhead for the small number of concats.

EDIT:
Due to some vague aspects of what it is exactly that I'm asking, let me reformulate it.

This is what happens:

  1. Main string is indexed.
  2. Parts of the main string are copied to a char[].
  3. The char[] is converted to a string.

What I'd like to do is merge step 2 and 3, resulting in:

  1. Main string is indexed.
  2. Parts of the main string are copied to a string (and the GC can keep its hands off of it during the process by proper use of the fixed keyword?).

And a note is that I cannot change the output type from string[], since this is an external library, and projects depend on it (backward compatibility).

Upvotes: 7

Views: 1846

Answers (4)

Chris Shain
Chris Shain

Reputation: 51349

I think that what you are asking to do is to 'carve up' an existing string in-place into multiple smaller strings without re-allocating character arrays for the smaller strings. This won't work in the managed world.

For one reason why, consider what happens when the garbage collector comes by and collects or moves the original string during a compaction- all of those other strings 'inside' of it are now pointing at some arbitrary other memory, not the original string you carved them out of.

EDIT: In contrast to the character-poking involved in Ben's answer (which is clever but IMHO a bit scary), you can allocate a StringBuilder with a pre-defined capacity, which eliminates the need to re-allocate the internal arrays. See http://msdn.microsoft.com/en-us/library/h1h0a5sy.aspx.

Upvotes: 3

Sean U
Sean U

Reputation: 6850

In .NET, there is no way to create an instance of String which shares data with another string. Some discussion on why that is appears in this comment from Eric Lippert.

Upvotes: 0

Ben Voigt
Ben Voigt

Reputation: 283694

What happens if you do:

string s = GetBuffer();
fixed (char* pch = s) {
    pch[0] = 'R';
    pch[1] = 'e';
    pch[2] = 's';
    pch[3] = 'u';
    pch[4] = 'l';
    pch[5] = 't';
}

I think the world will come to an end (Or at least the .NET managed portion of it), but that's very close to what StringBuilder does.

Do you have profiler data to show that StringBuilder isn't fast enough for your purposes, or is that an assumption?

Upvotes: 2

Jamie Treworgy
Jamie Treworgy

Reputation: 24344

Just create your own addressing system instead of trying to use unsafe code to map to an internal data structure.

Mapping a string (which is also readable as a char[]) to an array of smaller strings is no different from building a list of address information (index & length of each substring). So make a new List<Tuple<int,int>> instead of a string[] and use that data to return the correct string from your original, unaltered data structure. This could easily be encapsulated into something that exposed string[].

Upvotes: 2

Related Questions