MarcinJuraszek
MarcinJuraszek

Reputation: 125620

String to Stream without reallocating the entire content as byte[]

I know I can create a Stream out of a string with a simple MemoryStream+StreamWriter combination:

MemoryStream stream = new MemoryStream();
StreamWriter writer = new StreamWriter(stream);
writer.Write(value);
writer.Flush();
stream.Position = 0;

Or using GetBytes on an encoding:

new MemoryStream(Encoding.UTF8.GetBytes(value ?? ""))

However, both these solutions will eventually reallocate the entire string as a byte[]. For long strings it's a big deal.

Is there a way to get a Stream directly over a string without reallocating the entire thing? Similar to how you can wrap a MemoryStream over an existing byte[].

Upvotes: 1

Views: 98

Answers (1)

TnTinMn
TnTinMn

Reputation: 11801

Is there a way to get a Stream directly over a string without reallocating the entire thing? Similar to how you can wrap a MemoryStream over an existing byte[].

Create a custom stream class to wrap it. The following is very crude, but it performed a lot better than I had expected it to. I just whipped this up, so it has not had much testing (i.e.: It worked for my one test run), but it should be enough to show the concept.

Edit: I have modified the code to address the issue brought up in the comments that that the original code could not read a single byte from a multi-byte character. I also added the ability to specify an optional encoder for converting the characters to bytes. If an encoder is not provided, UTF-8 will be used.

The code has been minimally tested with Using a StreamReader, but should not be construed as ready for production use.

Imports System.Text

Public Class StringStream : Inherits IO.Stream
    Private bm As BufferManager

    ''' <summary>
    ''' Creates a non seekable stream from a System.String
    ''' </summary>
    ''' <param name="source"></param>
    ''' <param name="encoding">Default UTF-8</param>
    ''' <remarks></remarks>
    Public Sub New(source As String, Optional encoding As System.Text.Encoding = Nothing)
        Me.bm = New BufferManager(source, encoding)
    End Sub

    Public Overrides ReadOnly Property CanRead As Boolean
        Get
            Return True
        End Get
    End Property

    Public Overrides ReadOnly Property CanSeek As Boolean
        Get
            Return False
        End Get
    End Property

    Public Overrides ReadOnly Property CanWrite As Boolean
        Get
            Return False
        End Get
    End Property

    Public ReadOnly Property Encoding As System.Text.Encoding
        Get
            Return bm.Encoding
        End Get
    End Property

    Public Overrides Sub Flush()
    End Sub

    Public Overrides ReadOnly Property Length As Long
        Get
            Return 1 'Me.source.Length
        End Get
    End Property

    Public Overrides Property Position As Long
        Get
            Return Me.bm.Position
        End Get
        Set(value As Long)
            ' seek not supported
        End Set
    End Property

    Public Overrides Function ReadByte() As Integer
        ' Ref: https://msdn.microsoft.com/en-us/library/system.io.stream.readbyte(v=vs.110).aspx
        ' Reads a byte from the stream and advances the position within the stream by one byte, 
        ' or returns -1 if at the end of the stream.
        Dim ret As Int32 = -1
        Dim b As Byte
        If Me.bm.GetByte(b) Then ret = b
        Return ret
    End Function

    Public Overrides Function Read(buffer() As Byte, offset As Integer, count As Integer) As Integer
    ' ref: https://msdn.microsoft.com/en-us/library/system.io.stream.read(v=vs.110).aspx
    ' Return Value: The total number of bytes read into the buffer. 
    ' This can be less than the number of bytes requested if that many bytes are not currently available,
    ' or zero (0) if the end of the stream has been reached.
        Dim maxReturnedCount As Int32 = Math.Min(buffer.Length, count)
        Dim returnCount As Int32
        For i As Int32 = 0 To maxReturnedCount - 1
            If Me.bm.GetByte(buffer(i)) Then
                returnCount += 1
            Else
                Exit For
            End If
        Next
        Return returnCount

    End Function

    Public Overrides Function Seek(offset As Long, origin As IO.SeekOrigin) As Long
        Return -1
    End Function

    Public Overrides Sub SetLength(value As Long)
    End Sub

    Public Overrides Sub Write(buffer() As Byte, offset As Integer, count As Integer)
    End Sub

    Private Class BufferManager
        Private buffer As Byte()
        Private bufferPosition As Int32
        Private source As String
        Private positionInSource As Int32
        Private _encoding As System.Text.Encoding
        Private numBytesInbuffer As Int32
        Private readPosition As Int32

        Public Sub New(source As String, encoding As System.Text.Encoding)
            If encoding Is Nothing Then
                encoding = System.Text.Encoding.UTF8
            End If
            Me.source = source
            Me._encoding = encoding
            buffer = New Byte(0 To encoding.GetMaxByteCount(1) - 1) {}
        End Sub

        Public ReadOnly Property HasBytes As Boolean
            Get
                Return (numBytesInbuffer > 0) OrElse LoadCharToBuffer()
            End Get
        End Property

        Public ReadOnly Property Encoding As System.Text.Encoding
            Get
                Return Me._encoding
            End Get
        End Property

        Public ReadOnly Property Position As Int32
            Get
                Return Me.readPosition
            End Get
        End Property
        Private Function LoadCharToBuffer() As Boolean
            Dim ret As Boolean
            If positionInSource < Me.source.Length Then
                Me.numBytesInbuffer = Me._encoding.GetBytes(source, positionInSource, 1, buffer, 0)
                Me.positionInSource += 1
                Me.bufferPosition = 0
                ret = True
            End If
            Return ret
        End Function

        Public Function GetByte(ByRef value As Byte) As Boolean
            Dim ret As Boolean = Me.HasBytes
            If ret Then
                value = buffer(bufferPosition)
                Me.bufferPosition += 1
                Me.numBytesInbuffer -= 1
                Me.readPosition += 1
            End If
            Return ret
        End Function

    End Class

End Class

Upvotes: 1

Related Questions