Matt Wilde
Matt Wilde

Reputation: 293

How to preserve white space when reading fixed length text file

I am reading a .txt file that is, in essence, a csv type file with rows and columns, only the columns are separated by literal [space] characters. Each column has a predefined length of these characters, as given in a data definition file.

Sample of Text File

Below is what I have tried, but it ignores white space. The numbers are what's given me to differentiate columns.

function formatTextFile(input){
    var readFile = fileOpen(input,"read");
    var line1 = FileReadLine(readFile);
    var line2 = FileReadLine(readFile);
    line2 = REReplace(line2,"[^(.{7}.{10}.{47}.{34}.{14}.{13}.{15}.{22}.{28}.{18}.{2})]+[ ]", "|", "all");
    return line2;
}

This is the output from that:

|1. 16-1268|5/2/201|31530|$0.00|404|AP 

I thought about trying to use java to do string formatting. I haven't figured this one out.

Upvotes: 1

Views: 1554

Answers (3)

Matt Wilde
Matt Wilde

Reputation: 293

Thanks @Alex for providing a great solution. Java is the way to go and his method is very functional. I simply want to show how I implemented it into constructing a query.

Left with an array of columns and rows which you can put into a query object with QueryAddColumn. I had to mess with the column lengths but the java method is the way to go.Thanks again @Alex.

function formatTextFile(input){
        var fileReader = createObject("java","java.io.FileReader").init(input);
        var bufferedReader = createObject("java","java.io.BufferedReader").init(fileReader);
        var lineReader = createObject("java","java.io.LineNumberReader").init(bufferedReader);

        var columnLengths = [7,10,47,34,14,13,16,22,75,18,2];
        var rows = ArrayNew(2);
        var line = lineReader.readLine();
        //skip empty lines
        while(!structKeyExists(Local,"line")&&!len(trim(Local.line))){
            line = lineReader.readLine();
        }

        while(isDefined("line")){
            var cells = ArrayNew(1);
            var offset = 1;
            for(var i = 1; i<=ArrayLen(columnLengths); i++){
                cells.add(mid(line,offset,columnLengths[i]));
                offset+=columnLengths[i];
                ArrayAppend(rows[i],cells[i]);
            }
            line = lineReader.readLine();
        }
        return rows;        
    }

Image of the output

Upvotes: 2

Leigh
Leigh

Reputation: 28873

(Too long for comments)

Side note about the original question, the white space is being removed by the regular expression, not FileReadLine. Though personally I find the other approaches much more readable - technically it could be done with a regular expression.

For example, you could create an expression which matches the entire line and use multiple subexpressions to match the desired number of characters in each field. Then use the generated backreferences to create your pipe delimited string, ie \1|\2|\3 ....

Code: Tested with CF11

// ^ - starting with
// $ - ending with
// ( ) - subexpression or capturing group
// \1 - backreference, 1 corresponds to first subexpression
result = REReplace(line1
           ,"^(.{7})(.{10})(.{47})(.{34})(.{14})(.{13})(.{15})(.{22})(.{28})(.{18})(.{2})\s+$"
           , "\1|\2|\3|\4|\5|\6|\7|\8|\9|\10|\11"
        );

Sample data:

A1     B         C                                              D                                 E             F            G              H                     I                           J                 K   
A2     B         C                                              D                                 E             F            G              H                     I                           J                 K   
A3     B         C                                              D                                 E             F            G              H                     I                           J                 K  

If you dump the results, and check the len() of each field, you will see the spaces are preserved. So the problem is the original expression.

for (currValue in listToArray(line1, "|")) {
    writeOutput("<br>["& len(currValue) &"] "& currValue );
}

Results:

[7 ] A1     |
[10] B         |
[47] C                                              |
[34] D                                 |
[14] E             |
[13] F            |
[15] G              |
[22] H                     |
[28] I                           |
[18] J                 |
[2 ] K |

Upvotes: 1

Alex
Alex

Reputation: 7833

Why even bother with regex if you have fixed column widths? Consider the following:

<cfscript>

    line = "ab  12  xy 654   +1234    ";
    columnLengths = [4, 4, 3, 6, 1, 8];

    o = readFixedLine(line, columnLengths);

    function readFixedLine(line, columnLengths) {

        var cells = [];
        var offset = 1;

        for (length in columnLengths) {
            cells.add( mid(line, offset, length) );
            offset += length;
        }

        return cells;
    }

    // cells read (the dump doesn't display whitespaces)
    writeDump(o);
    /*
        whitespaces represented as asterisks
        [1] ab**
        [2] 12**
        [3] xy*
        [4] 654***
        [5] +
        [6] 1234****
    */

    // proving that whitespaces are kept
    for (entry in o) {
        writeOutput( len(entry) & "<br>" );
    }
    /*
        4
        4
        3
        6
        1
        8
    */

</cfscript>

The mid function is failsafe (in that regard), so you don't even have to worry about empty lines or missing columns. However, fileReadLine trims trailing whitespaces, so you are out of luck here. Only Java can help at this point (read your file with Stream or BufferedReader).

Upvotes: 1

Related Questions