Chris Hillman
Chris Hillman

Reputation: 65

Massaging a Fixed Width source file with C#

Problem

Current Data

........Column 1....Column 2.......Column3....Column 4

Row1...........0...........0.............0...........Y    
Row2.......3142.56...........500............0...........N    
Row3.......3142.56...........500............0...........N

The source file has fixed width columns The program that exports the fixed width columns, doesn't include numbers after the decimal place as part of the reserved fixed width size

I have created a C# script that re-writes the file and attempts to resolve this issue.

I have found a way to read the row, and split into columns. This becomes a string variable. However need to determine if the string contains a "0-9" followed by a "." pattern. I then need to count how many decimals are after the pattern. Then delete the X amount of white space (Number of decimal places at the start).

so

Current State [_ _ _ _ _3142.56]

What we want to see After [_ _ _3142.56]

Attempts so far So far I have been able to find that Regex seems to do what im after. Then IndexOf(".").length can be used to count the number of positions after the decimal.

So I have come up with the below

        // Resolve Decimal Issues
        foreach (object Column in splitLine)
        {
            String CurrentColumn = Column.ToString();

            if (Regex.Match(CurrentColumn, @"^[0-9]+(\.[0-9]+)?$").Success == true)
            {
                // Count how many numbers AFTER a decimal
                int decimalLength = CurrentColumn.Substring(CurrentColumn.IndexOf(".")).Length;
                if (decimalLength >= 1)
                {
                    // Remove this amount of places from the start of the string
                    CurrentColumn = CurrentColumn.Substring(CurrentColumn.Length - decimalLength);
                }
            }

             //Start re-joining the string
            newLine = newLine + CurrentColumn + "\t";
        }

The problem is that the IndexOf is returning a -1 when it finds no matching, causing a error.

Error Stack

Error: System.Reflection.TargetInvocationException: Exception has been thrown by the target of an invocation. 
---> System.ArgumentOutOfRangeException: StartIndex cannot be less than zero.

Parameter name: startIndex
   at System.String.InternalSubStringWithChecks(Int32 startIndex, Int32 length, Boolean fAlwaysCopy)
   at ST_dd38f3d289db4495bf07257723356ed3.csproj.ScriptMain.Main()

   --- End of inner exception stack trace ---
   at System.RuntimeMethodHandle._InvokeMethodFast(Object target, Object[] arguments, SignatureStruct& sig, MethodAttributes methodAttributes, RuntimeTypeHandle typeOwner)
   at System.RuntimeMethodHandle.InvokeMethodFast(Object target, Object[] arguments, Signature sig, MethodAttributes methodAttributes, RuntimeTypeHandle typeOwner)
   at System.Reflection.RuntimeMethodInfo.Invoke(Object obj, BindingFlags invokeAttr, Binder binder, Object[] parameters, CultureInfo culture, Boolean skipVisibilityChecks)
   at System.Reflection.RuntimeMethodInfo.Invoke(Object obj, BindingFlags invokeAttr, Binder binder, Object[] parameters, CultureInfo culture)
   at System.RuntimeType.InvokeMember(String name, BindingFlags bindingFlags, Binder binder, Object target, Object[] providedArgs, ParameterModifier[] modifiers, CultureInfo culture, String[] namedParams)
   at System.Type.InvokeMember(String name, BindingFlags invokeAttr, Binder binder, Object target, Object[] args, CultureInfo culture)
   at Microsoft.SqlServer.Dts.Tasks.ScriptTask.VSTATaskScriptingEngine.ExecuteScript()

So im a bit confused as to what I can do to solve this. I think im on the right path.. but this last error has me a bit lost.

Upvotes: 4

Views: 437

Answers (3)

Jim Mischel
Jim Mischel

Reputation: 133995

I think your logic is flawed.

Given bbbb123.45 (b is a space), your logic will give a decimalLength of 3. CurrentColumn.Substring(CurrentColumn.Length - decimalLength) will return .45.

What you really want is CurrentColumn.Substring(decimalLength), which will start at the 3rd character and return b123.45.

The approach is much the same:

    // Resolve Decimal Issues
    foreach (object Column in splitLine)
    {
        String CurrentColumn = Column.ToString();

        if (Regex.IsMatch(CurrentColumn, @"^[0-9]+(\.[0-9]+)?$"))
        {
            // If there's a decimal point, remove characters from the front
            // of the string to compensate for the decimal portion.
            int decimalPos = CurrentColumn.IndexOf(".");
            if (decimalPos != -1)
            {
                CurrentColumn = CurrentColumn.Substring(CurrentColumn.Length - decimalPos);
            }
        }

         //Start re-joining the string
        newLine = newLine + CurrentColumn + "\t";
    }

This fails rather badly, by the way, if the length of the decimal portion exceeds the number of spaces at the front of the string. From your description, I don't think that's a problem. But it's something to keep in mind.

Upvotes: 2

unlimit
unlimit

Reputation: 3752

Try this:

// Resolve Decimal Issues
foreach (object Column in splitLine)
{
    String CurrentColumn = Column.ToString();
    char[] s = {'.'};

    if (Regex.Match(CurrentColumn, @"^[0-9]+(\.[0-9]+)?$").Success && CurrentColumn.Contains('.'))
        {
            // Count how many numbers AFTER a decimal
            int decimalLength = CurrentColumn.split(s, StringSplitOptions.None)[1].Length;
            if (decimalLength >= 1)
            {
                // Remove this amount of places from the start of the string
                CurrentColumn = CurrentColumn.Substring(CurrentColumn.Length - decimalLength);
            }
        }

         //Start re-joining the string
        newLine = newLine + CurrentColumn + "\t";
    }

Upvotes: 0

Alex
Alex

Reputation: 23300

A short, dense and LINQed approach would be the following. No need to look for anything, just split, pack, pad and rebuild. This actually (I just noticed) works for any text file which is to be made a fixed-width one.

// "inputData" is assumed to contain the whole source file

const int desiredFixedWidth = 12; // How wide do  you want your columns ?
const char paddingChar = ' '; // What char do you want to pad your columns with?

// Step 1: Split the lines
var srcLines = inputData.Split(new string[]{Environment.NewLine}, StringSplitOptions.RemoveEmptyEntries);

// Step 2: Split up each line, ditch extra chars, pad the values, rebuild the file
var outLines = srcLines.Select(s => 
    string.Join(paddingChar.ToString(), 
        s.Split(new string[] { paddingChar.ToString() }, StringSplitOptions.RemoveEmptyEntries)
            .Select(l => l.PadLeft(desiredFixedWidth, paddingChar))));

On a side note, the "generator" of your broken file needs to be fixed to adhere to the width you want ...

Upvotes: 0

Related Questions