avword
avword

Reputation: 219

Finding the Index of Characters in a String

I have a text file that is automatically generated by an older computer system daily.

Unfortunately, the columns in this file are not delimited and they are not exactly fixed width (each day the width of each column could change depending on the amount of chars of the data in each column). The file does have column headings, so I want to find the width of each column using the column headings. Here is an example of the column heading row:

JOB_NO[variable amount of white space chars]FILE_NAME[variable amount of ws chars]PROJECT_CODE[variable amount of ws chars][carriage return]

What I want to do is get the index of of the first char in a column and the index of the last white space of a column (from the column heading). I would want to get the index of the "J" in JOB_NUM and the last white space before the "F" in FILE_NAME for the first column.

I guess I should also mention that the columns may not always be in the same order from day to day but they will have the same header names.

Any thoughts about how do do this in VB.net or c#? I know I can use the string.indexOf("JOB_NO") to get the index of the start of the column, but how do I get the index of the last white space in each column? (or last whitespace before the next first non-whitespace that denotes the start of the next column)

Upvotes: 4

Views: 4205

Answers (3)

jangeador
jangeador

Reputation: 604

Here is an alternative answer using a small class which you can later use to parse your lines. You can use the fields collection as a template to pull the fields for each of your lines, this solution does not ignore the whitespaces as I presume that they are variable because the fields vary in length each day and you would need that data:

Imports System.Text.RegularExpressions

Module Module1

    Sub Main()

        Dim line As String = "JOB_NUM      FILE_NAME         SOME_OTHER_THING  "
        Dim Fields As List(Of Field) = New List(Of Field)
        Dim oField As Field = Nothing

        Dim mc As MatchCollection = Regex.Matches(
            line, "(?<=^| )\w")

        For Each m As Match In mc
            oField = New Field
            oField.Start = m.Index
            'Loop through the matches
            If m.NextMatch.Index = 0 Then
                'This is the last field
                oField.Length = line.Length - oField.Start
            Else
                oField.Length = m.NextMatch.Index - oField.Start
            End If
            oField.Name = line.Substring(oField.Start, oField.Length)
            'Trim the field name:
            oField.Name = Trim(oField.Name)
            'Add to the list
            Fields.Add(oField)
        Next

        'Check the Fields: you can use line.substring(ofield.start, ofield.length)
        'to parse each line of your file.

        For Each f As Field In Fields
            Console.WriteLine("Field Name: " & f.Name)
            Console.WriteLine("Start: " & f.Start)
            Console.WriteLine("Length " & f.Length)
        Next

        Console.Read()
    End Sub

    Class Field
        Public Property Name As String
        Public Property Start As Integer
        Public Property Length As Integer
    End Class

End Module

Upvotes: 0

spender
spender

Reputation: 120450

Borrowing heavily from a previous answer I've given... To get column positions, how about this? I'm making the assumption that column names do not contain spaces.

IEnumerable<int> positions=Regex
    .Matches("JOB_NUM   FILE_NAME         SOME_OTHER_THING",@"(?<=^| )\w")
    .Cast<Match>()
    .Select(m=>m.Index);

or (verbose version of the above)

//first get a MatchCollection
//this regular expression matches a word character that immediately follows
//either the start of the line or a space, i.e. the first char of each of
//your column headers
MatchCollection matches=Regex
    .Matches("JOB_NUM   FILE_NAME         SOME_OTHER_THING",@"(?<=^| )\w");
//convert to IEnumerable<Match>, so we can use Linq on our matches
IEnumerable<Match> matchEnumerable=matches.Cast<Match>();
//For each match, select its Index
IEnumerable<int> positions=matchEnumerable.Select(m=>m.Index);
//convert to array (if you want)
int[] pos_arr=positions.ToArray();

Upvotes: 0

Kamyar
Kamyar

Reputation: 18797

Get the indexes of all columns. e.g.

var jPos = str.IndexOf("JOB_NO");
var filePos = str.IndexOf("FILE_NAME");
var projPos = str.IndexOf("PROJECT_CODE");  

Then sort them in an array. from min to max. now you know your columns order. the last space of first column is [the_next_column's_index]-1.

int firstColLastSpace = ar[1] -1;
int secColLastSpace = ar[2] -1;

Upvotes: 3

Related Questions