Dom Sinclair
Dom Sinclair

Reputation: 2528

What is the most efficient way to extract information from a text file formatted like this

Consider a text file stored in an online location that looks like this:

    ;aiu;

[MyEditor45]
Name = MyEditor 4.5
URL = http://www.myeditor.com/download/myeditor.msi
Size = 3023788
Description = This is the latest version of MyEditor
Feature = Support for other file types 
Feature1 = Support for different encodings
BugFix = Fix bug with file open
BugFix1 = Fix crash when opening large files
BugFix2 = Fix bug with search in file feature
FilePath = %ProgramFiles%\MyEditor\MyEditor.exe
Version = 4.5

Which details information about a possible update to an application which a user could download. I want to load this into a stream reader, parse it and then build up a list of Features, BugFixes etc to display to the end user in a wpf list box.

I have the following piece of code that essentially gets my text file (first extracting its location from a local ini file and loads it into a streamReader. This at least works although I know that there is no error checking at present, I just want to establish the most efficient way to parse this first. One of these files is unlikely to ever exceed more than about 250 - 400 lines of text.

Dim UpdateUrl As String = GetUrl()
    Dim client As New WebClient()
    Using myStreamReader As New StreamReader(client.OpenRead($"{UpdateUrl}"))

        While Not myStreamReader.EndOfStream
            Dim line As String = myStreamReader.ReadLine
            If line.Contains("=") Then
                Dim p As String() = line.Split(New Char() {"="c})
                If p(0).Contains("BugFix") Then
                    MessageBox.Show($" {p(1)}")
                End If
            End If
        End While
    End Using

Specifically I'm looking To collate the information about Features, BugFixes and Enhancements. Whilst I could construct what would in effect be a rather messy if statement I feel sure that there must be a more efficient way to do this , possibly involving linq. I'd welcome any suggestions.

I have added the wpf tag on the off chance that someone reading this with more experience of displaying information in wpf listboxes than I have might just spot a way to effectively define the info I'm after in such a way that it could then be easily displayed in a wpf list box in three sections (Features, Enhancements and BugFixes).

Upvotes: 0

Views: 229

Answers (3)

Gert Arnold
Gert Arnold

Reputation: 109137

The given answer only focuses on the first part, converting the data to a structure that can be shaped for display. But I think you main question is how to do the actual shaping.

I used a somewhat different way to collect the file data, using Microsoft.VisualBasic.FileIO.TextFieldParser because I think that makes coding just al little bit easier:

Iterator Function GetTwoItemLines(fileName As String, delimiter As String) _
        As IEnumerable(Of Tuple(Of String, String))
    Using tfp = New TextFieldParser(fileName)
        tfp.TextFieldType = FieldType.Delimited
        tfp.Delimiters = {delimiter}
        tfp.HasFieldsEnclosedInQuotes = False
        tfp.TrimWhiteSpace = False

        While Not tfp.EndOfData
            Dim arr = tfp.ReadFields()
            If arr.Length >= 2 Then
                Yield Tuple.Create(arr(0).Trim(), String.Join(delimiter, arr.Skip(1)).Trim())
            End If
        End While
    End Using
End Function

Effectively the same thing happens as in your code, but taking into account Andrew's keen caution about data loss: a line is split by = characters, but the second field of a line consists of all parts after the first part with the delimiter re-inserted: String.Join(delimiter, arr.Skip(1)).Trim().

You can use this function as follows:

Dim fileContent = GetTwoItemLines(file, "=")

For display, I think the best approach (most efficient in terms of lines of code) is to group the lines by their first items, removing the numeric part at the end:

Dim grouping = fileContent.GroupBy(Function(c) c.Item1.TrimEnd("0123456789".ToCharArray())) _
    .Where(Function(k) k.Key = "Feature" OrElse k.Key = "BugFix" OrElse k.Key = "Enhancement")

Here's a Linqpad dump (in which I took the liberty to change one item a bit to demonstrate the correct dealing with multiple = characters:

enter image description here

Upvotes: 1

VBobCat
VBobCat

Reputation: 2712

You could do it with Regular Expressions:

Imports System.Text.RegularExpressions

Private Function InfoReader(ByVal sourceText As String) As List(Of Dictionary(Of String, String()))
    '1) make array of fragments for each product info
    Dim products = Regex.Split(sourceText, "(?=\[\s*\w+\s*])")
    '2) declare variables needed ahead
    Dim productProperties As Dictionary(Of String, String)
    Dim propertyNames As String()
    Dim productGroupedProperties As Dictionary(Of String, String())
    Dim result As New List(Of Dictionary(Of String, String()))
    '2) iterate along fragments
    For Each product In products
        '3) work only in significant fragments ([Product]...)
        If Regex.IsMatch(product, "\A\[\s*\w+\s*]") Then
            '4) make array of property lines and extract dictionary of property/description
            productProperties = Regex.Split(product, "(?=^\w+\s*=)", RegexOptions.Multiline).Where(
            Function(s) s.Contains("="c)
            ).ToDictionary(
            Function(s) Regex.Match(s, "^\w+(?=\s*=)").Value,
            Function(s) Regex.Match(s, "(?<==\s+).*(?=\s+)").Value)
            '5) extract distinct property names, ignoring numbered repetitions
            propertyNames = productProperties.Keys.Select(Function(s) s.TrimEnd("0123456789".ToCharArray)).Distinct.ToArray
            '6) make dictionary of distinctProperty/Array(Of String){description, description1, ...}
            productGroupedProperties = propertyNames.ToDictionary(
            Function(s) s,
            Function(s) productProperties.Where(
                Function(kvp) kvp.Key.StartsWith(s)
                ).Select(
                Function(kvp) kvp.Value).ToArray)
            '7) enlist dictionary to result
            result.Add(productGroupedProperties)
        End If
    Next
    Return result
End Function

Upvotes: 0

user5684647
user5684647

Reputation:

Dom, Here is an answer in C#. I will try to convert it to VB.Net momentarily. First, since the file is small, read all of it into a list of strings. Then select the strings that contain an "=" and parse them into data items that can be used. This code will return a set of data items that you can then display as you like. If you have LinqPad, you can test thecode below, or I have the code here: dotnetfiddle

Here is the VB.Net version: VB.Net dotnetfiddle

   Imports System
Imports System.Collections.Generic
Imports System.Linq

Public Class Program
    Public Sub Main()
        Dim fileContent As List(Of String) = GetFileContent()

        Dim dataItems = fileContent.Where(Function(c) c.Contains("=")).[Select](Function(c) GetDataItem(c))

        dataItems.Dump()
    End Sub


    Public Function GetFileContent() As List(Of String)
        Dim contentList As New List(Of String)()

        contentList.Add("sb.app; aiu;")
        contentList.Add("")
        contentList.Add("[MyEditor45]")
        contentList.Add("Name = MyEditor 4.5")
        contentList.Add("URL = http://www.myeditor.com/download/myeditor.msi")
        contentList.Add("Size = 3023788")
        contentList.Add("Description = This is the latest version of MyEditor")
        contentList.Add("Feature = Support for other file types")
        contentList.Add("Feature1 = Support for different encodings")
        contentList.Add("BugFix = Fix bug with file open")
        contentList.Add("BugFix1 = Fix crash when opening large files")
        contentList.Add("BugFix2 = Fix bug with search in file feature")
        contentList.Add("FilePath = % ProgramFiles %\MyEditor\MyEditor.exe")
        contentList.Add("Version = 4.5")

        Return contentList
    End Function

    Public Function GetDataItem(value As String) As DataItem
        Dim parts = value.Split("=", 2, StringSplitOptions.None)

        Dim dataItem = New DataItem()

        dataItem.DataType = parts(0).Trim()
        dataItem.Data = parts(1).Trim()

        Return dataItem
    End Function
End Class

Public Class DataItem
    Public DataType As String
    Public Data As String
End Class

Or, in C#:

void Main()
{
    List<string> fileContent = GetFileContent();

    var dataItems = fileContent.Where(c => c.Contains("="))
                               .Select(c => GetDataItem(c)); 


    dataItems.Dump();
}    

public List<string> GetFileContent()
{
    List<string> contentList = new List<string>();

    contentList.Add("sb.app; aiu;");
    contentList.Add("");
    contentList.Add("[MyEditor45]");
    contentList.Add("Name = MyEditor 4.5");
    contentList.Add("URL = http://www.myeditor.com/download/myeditor.msi");
    contentList.Add("Size = 3023788");
    contentList.Add("Description = This is the latest version of MyEditor");
    contentList.Add("Feature = Support for other file types");
    contentList.Add("Feature1 = Support for different encodings");
    contentList.Add("BugFix = Fix bug with file open");
    contentList.Add("BugFix1 = Fix crash when opening large files");
    contentList.Add("BugFix2 = Fix bug with search in file feature");
    contentList.Add("FilePath = % ProgramFiles %\\MyEditor\\MyEditor.exe");
    contentList.Add("Version = 4.5");

    return contentList;
}

public DataItem GetDataItem(string value)
{
    var parts = value.Split('=');

    var dataItem = new DataItem()
    {
        DataType = parts[0],
        Data = parts[1]
    };

    return dataItem;
}

public class DataItem
{
    public string DataType;
    public string Data;
}

Upvotes: 1

Related Questions