MALON
MALON

Reputation: 742

Remove trailing whitespace recursively only at end of file using grep/sed?

Basically, I've got about 1,500 files and the last character of any of these files should not be any type of white space.

How do I check a bunch of files to make sure that they don't end in some form of whitespace?(newline, space, carriage return, tab, etc.)?

Upvotes: 1

Views: 8880

Answers (9)

akond
akond

Reputation: 16060

Version 2. Linux syntax. Proper command.

find /directory/you/want -type f | \ 
xargs --verbose -L 1 sed -n --in-place -r \
':loop;/[^[:space:]\t]/ {p;b;}; N;b loop;'  

Version 1. Remove whitespace at the end of each line. FreeBSD syntax.

find /directory/that/holds/your/files -type f | xargs -L 1  sed  -i '' -E 's/[:         :]+$//'

where the white space in [: :] actually consists of one space and one tab characters. With space it's easy. You just hit the space button. In order to get tab character inserted press Ctrl-V and then Tab in the shell.

Upvotes: 0

yabt
yabt

Reputation: 1

Using man dd without man ed:

while IFS= read -r -d $'\0' file; do
   filesize="$(wc -c < "${file}")"
   while [[ $(tail -c 1 "${file}" | tr -dc '[[:space:]]' | wc -c) -eq 1 ]]; do
      printf "" | dd  of="${file}" seek=$(($filesize - 1)) bs=1 count=1
      let filesize-=1
   done
done < <(find -x "/path/to/dir" -type f -not -empty -print0)

Upvotes: 0

yabt
yabt

Reputation: 11

You may also use man ed to delete trailing white space at file end and man dd to delete a final newline (although keep in mind that ed reads the whole file into memory and performs an in-place edit without any kind of previous backup):

# tested on Mac OS X using Bash
while IFS= read -r -d $'\0' file; do
   # remove white space at end of (non-empty) file
   # note: ed will append final newline if missing
   printf '%s\n' H '$g/[[:space:]]\{1,\}$/s///g' wq | ed -s "${file}"
   printf "" | dd  of="${file}" seek=$(($(stat -f "%z" "${file}") - 1)) bs=1 count=1
   #printf "" | dd  of="${file}" seek=$(($(wc -c < "${file}") - 1)) bs=1 count=1
done < <(find -x "/path/to/dir" -type f -not -empty -print0)

Upvotes: 1

ghostdog74
ghostdog74

Reputation: 342819

ruby -e 's=ARGF.read;s.rstrip!;print s' file

basically, read the whole file, strip the last whitespace if any, and print out the contents. So this solution is not for VERY huge files.

Upvotes: 1

mob
mob

Reputation: 118665

A Perl solution:

# command-line arguments are the names of the files to check.
# output is names of files that end with trailing whitespace
for (@ARGV) {
  open F, '<', $_;
  seek F, -1, 2;                # seek to before last char in file
  print "$_\n" if <F> =~ /\s/
}

Upvotes: 1

Dennis Williamson
Dennis Williamson

Reputation: 360485

awk '{if (flag) print line; line = $0; flag = 1} END {gsub("[[:space:]]+$","",line); printf line}'

Edit:

New version:

The sed command removes all the trailing lines that consist of only whitespace then the awk command removes the ending newline.

sed '/^[[:space:]]*$/{:a;$d;N;/\n[[:space:]]*$/ba}' inputfile |
    awk '{if (flag) print line; line = $0; flag = 1} END {printf line}'

The disadvantage is that it reads the file twice.

Edit 2:

Here's an all-awk solution that only reads the file once. It accumulates white-space-only lines in a manner similar to the sed command above.

#!/usr/bin/awk -f

# accumulate a run of white-space-only lines so they can be printed or discarded
/^[[:space:]]*$/ {
    accumlines = accumlines nl $0
    nl = "\n"
    accum = 1
    next
}

# print the previous line and any accumulated lines, store the current line for the next pass
{
    if (flag) print line
    if (accum) { print accumlines; accum = 0 }
    accumlines = nl = ""
    line = $0
    flag = 1
}

# print the last line without a trailing newline after removing all trailing whitespace
# the resulting output could be null (nothing rather than 0x00)
# note that we're not print the accumulated lines since they're part of the 
# trailing white-space we're trying to get rid of
END {
    gsub("[[:space:]]+$","",line)
    printf line
}

Edit 3:

  • removed unnecessary BEGIN clause
  • changed lines to accumlines so it's easier to distinguish from line (singular)
  • added comments

Upvotes: 3

j_random_hacker
j_random_hacker

Reputation: 51296

Just for fun, here's a plain C answer:

#include <stdio.h>
#include <ctype.h>
#include <stdlib.h>

int main(int argc, char **argv) {
    int c, bufsize = 100, ns = 0;
    char *buf = malloc(bufsize);

    while ((c = getchar()) != EOF) {
        if (isspace(c)) {
            if (ns == bufsize) buf = realloc(buf, bufsize *= 2);
            buf[ns++] = c;
        } else {
            fwrite(buf, 1, ns, stdout);
            ns = 0;
            putchar(c);
        }
    }

    free(buf);
    return 0;
}

Not much longer than Dennis's awk solution, and, dare I say, it, easier to understand! :-P

Upvotes: 1

glenn jackman
glenn jackman

Reputation: 247062

Might be easier reading the file from the bottom to the top:

tac filename | 
awk '
    /^[[:space:]]*$/ && !seen {next} 
    /[^[:space:]]/   && !seen {gsub(/[[:space:]]+$/,""); seen=1}
    seen
' | 
tac

Upvotes: 1

j_random_hacker
j_random_hacker

Reputation: 51296

This will strip all trailing whitespace:

perl -e '$s = ""; while (defined($_ = getc)) { if (/\s/) { $s .= $_; } else { print $s, $_; $s = ""; } }' < infile > outfile

There's probably an equivalent in sed but I'm much more familiar with Perl, hope that works for you. Basic idea: if the next character is whitespace, save it; otherwise, print any saved characters followed by the character just read. If we hit EOF after reading one or more whitespace characters, they won't be printed.

This will simply detect trailing whitespace, giving an exit code of 1 if so:

perl -e 'while (defined($_ = getc)) { $last = $_; } exit($last =~ /\s/);' < infile > outfile

[EDIT] The above describes how to detect or change a single file. If you have a large directory tree containing files that you want to apply the changes to, you can put the command in a separate script:

fix.pl

#!/usr/bin/perl
$s = "";
while (defined($_ = getc)) {
    if (/\s/) { $s .= $_; } else { print $s, $_; $s = ""; }
}

and use it in conjunction with the find command:

find /top/dir -type f -exec sh -c 'mv "{}" "{}.bak" && fix.pl < "{}.bak" > "{}"' ';'

This will move each original file to a backup file ending in ".bak". (It would be a good idea to test this on a small test fileset first.)

Upvotes: 2

Related Questions