gunakkoc
gunakkoc

Reputation: 1069

Most efficient way of saving list of lists to a file?

In VS 2010, I have a big list of strings and each item in the list contains list of strings also(it is not going any further). The good thing is only additions will take place. Nothing will be removed from the lists.

I do NOT want to use database. Since the list might get quite large, XML seemed slow to me. I couldn't find any common solution for my case. Any idea?

Edit: Okay, some of my codes will make it clearer I guess.

Class Word
    Public theWord As String
    Public SubWords As New List(Of SubWord)
    Public Count As Integer = 1
    Sub New(ByRef Word As String)
        theWord = Word
    End Sub
    Public Sub AddSubWord(ByRef Word As String)
        Dim SubWordCount As Integer = SubWords.Count - 1
        Dim Found As Boolean
        For i = 0 To SubWordCount
            If SubWords(i).theWord = Word Then
                SubWords(i).Count += 1
                Found = True
                Exit For
            End If
        Next
        If Found = False Then
            SubWords.Add(New SubWord(Word))
        End If
    End Sub
    Public Overrides Function ToString() As String
        Return theWord
    End Function
End Class

Class SubWord
    Public theWord As String
    Public Count As Integer = 1
    Sub New(ByRef Word As String)
        theWord = Word
    End Sub
    Public Overrides Function ToString() As String
        Return theWord
    End Function
End Class

Also the list I have is:

Dim Words As New List(Of Word)

The aim is to add a word to list if the word is not in the list if not increase the count of it. Same for the subwords. Later, all the lists will be sorted according to their counts. There will be sooooo many words and for each of them a huge subword list.

Upvotes: 2

Views: 234

Answers (1)

Steven Doggart
Steven Doggart

Reputation: 43743

XML does seem like the best option, but if you are really concerned about efficiency, and you are certain the data structure isn't going to change in the future, you could simply store the data in a delimited text file. For instance:

Private Sub SaveList(filePath As String, list As List(Of List(Of String)))
    Const fieldDelimiter As String = ","
    Const recordDelimiter As String = Environment.NewLine
    Dim temp As New List(Of String)()
    For each i as List(Of String) in list)
        temp.Add(String.Join(fieldDelimiter, i.ToArray()))
    Next
    Dim contents As String = String.Join(recordDelimiter, temp.ToArray())
    File.WriteAllText(filePath, contents)
End Sub

Or, more efficiently:

Private Sub SaveList(filePath As String, list As List(Of List(Of String)))
    Const fieldDelimiter As String = ","
    Const recordDelimiter As String = Environment.NewLine
    Using writer As New StreamWriter(filePath)
        Dim firstRecord As Boolean = True
        For Each record as List(Of String) In list)
            If firstRecord Then
                firstRecord = False
            Else
                writer.Write(recordDelimiter)
            End If
            Dim firstField As Boolean = True
            For Each field As String In record
                If firstField Then
                    firstField = False
                Else
                    writer.Write(fieldDelimiter)
                End If
                writer.Write(field)
            Next
        Next
    End Using
End Sub    

The drawback to this approach is that you need to make sure that the delimiters you use never will occur within any of the fields in any of the records. If you know for sure the strings will never contain a certain unusual character, then you could just use that. Otherwise, the alternative would be to escape any occurrences. So for instance, if you are using a comma as a delimiter, then you would need to replace all occurences of , with \,, then also replace all occurences of \ with \\. This, of course, complicates not only your saving logic, but your loading logic as well.

UPDATE

If speed is your main concern, and you can guarantee that Words and Subwords will all be less than 100 characters, then the fastest method of reading and writing the data would be to write each word on a new line of a text file, followed be each Subword using fixed width fields. For instance, if you had a max length of five, the file might look something like this:

Word Sub1 Sub2
W2   SW1  SW2  SW3
W3
W4   SubWdSub2.

As you can see in that example, there are four Words ("Word", "W2", "W3", and "W4"), and they each have differing numbers of Subwords. The Subwords for "Word" are "Sub1" and "Sub2". "W3" has no Subwords, and W4 has 2 ("SubWd" and "Sub2.").

So, to write out that file, you could do something like this:

Private Sub SaveWords(filePath As String, words As List(Of Word))
    Const maxLength As Integer = 100
    Using writer As New StreamWriter(filePath)
        Dim firstWord As Boolean = True
        For Each w As Word in words
            If firstWord Then
                firstWord = False
            Else
                writer.WriteLine()
            End If
            writer.Write(w.theWord.PadRight(maxLength))
            For Each s As SubWord In w.SubWords
                writer.Write(s.theWord.PadRight(maxLength))
            Next
        Next
    End Using
End Sub

Upvotes: 1

Related Questions