user460114
user460114

Reputation: 1848

ColdFusion remove blank lines from text file

I'm using the following code to update robots.txt depending on whether a specific page is flagged as allow or disallow.

<cflock type="exclusive" timeout="5">
    <cfset vRemoveLine = ListContainsNoCase(robots,"Disallow: #sURL#", "#chr(13)##chr(10)#")>
    <cfif vRemoveLine>
        <cfset robots = ListDeleteAt(robots, vRemoveLine, "#chr(13)##chr(10)#")>
    </cfif>
    <cffile action="write"
        file="#sitePath#robots.txt"
        output="#robots#"
        nameconflict="overwrite">
</cflock>

However, it's not finished and/or could be written better. Specifically, when removing a line, it doesn't get rid of its associated carriage returns as well, more so if the line was anywhere except right at the bottom.

Screenshots:

1) Before removing line

enter image description here

2) After removing line

enter image description here

Note also the additional blank line at the bottom. I need to lose all these blank lines, in addition to the removal of a disallow and its line break.

Upvotes: 0

Views: 2242

Answers (1)

Peter Boughton
Peter Boughton

Reputation: 112170

Actually, paying more attention to your code, you can simply do...

<cfset robots = robots.replaceAll( "(?m)^Disallow: #ReEscape(sURL)#(?:\r?\n|\z)" , "" ) />

...instead of those List functions.

This removes the line-breaks for the line you've just removed, but doesn't remove any that exist elsewhere in the file (potentially for splitting up sections and improving readability).

You can of course still also use trim if you want to ensure there are no blanks at the end of the file.

By way of explanation, here is the above regex again, in extended/comment form:

(?x)    ## enable extended/comment mode
        ## (literal whitespace is ignored, hashes start comments, also ignored)
(?m)    ## enable multiline mode
        ## (meaning  ^ and $ match start/end of each line, as well as of entire input)

^Disallow:\  ## Match literal text "Disallow: " at start of a line.
             ## (In comment mode, a \ is needed before the space
             ##  in standard use this is not required.)

#ReEscape(sURL)#   ## use ReEscape to avoid issues since the URL might
                   ## contain characters that are non-literal in a regex.

(?:     ## non-capturing group to contain alternation between...

    \r?\n   ## match optional carriage return followed by a newline.
|       ## or
    \z      ## match end of input (whether there is a newline there or not)
)

(To use that in CFML, wrap it in both cfsavecontent and cfoutput, then put the resulting variable inside robot.replaceAll(here,'').)


If you really want to ensure there aren't multiple newlines in the file, (irrespective of any changes related to removing disallow lines), the simplest way is:

<cfset robots = robots.trim().replaceAll('\r','').replaceAll('\n{2,}','\n') />

Which trims both ends, then removes all carriage returns, then replaces all instances of at least two newlines with just a single newline.

(But in general I would probably recommend the initial more specific expression over blanket removal of multiple newlines.)

Upvotes: 2

Related Questions