vixalien
vixalien

Reputation: 390

sed: Delete match and newline (with CRLF)

I want to query for console.log in all files in a directory, then delete lines that match with the newline after those lines. AWK, Perl, Powershell and Windows solutions are welcome too (besides sed).

However, because I'm on Windows, my code ends with CRLF so plain sed /d doesn't work. It just replace the line by a blank one.

Before:

hello.world();
console.log("Debug info");
foo.bar();

After:

hello.world();
foo.bar();

The current command I used, find pages -type f -exec sed -i -e '/console.log/d' {} \; only replaces a line with a blank line like:

hello.world();

foo.bar();

Here are some screenshots to verify what I'm saying.

Running Command

Github Desktop diff before and after

Upvotes: 0

Views: 236

Answers (1)

tripleee
tripleee

Reputation: 189679

In theory the d command in sed should remove the entire line, but lines with DOS line endings are a pesky corner case; clearly, not all sed versions operate correctly in their presence.

If you are willing to use Perl instead, this should be straightforward:

perl -ni -e 'print unless /console\.log/' pages/**

Demo: https://ideone.com/NNYocH

(The wildcard pages/** is not portable, but convenient in place of find if you have a shell which supports this.)

If you have GNU Awk, you can similarly use that with the -i inplace option to overwrite the input files.

awk -i inplace 'BEGIN{RS=ORS="\r\n"} !/console\.log/' pages/**

There may be a similar problem with DOS CR line endings with some Awk versions, but I believe GNU Awk should cope.

Finally, here is a variant which uses a temporary file, and should work with any Awk.

find pages -type f -exec sh -c '
    for file; do
        awk "BEGIN{RS=ORS=\"\\r\\n\"} !/console\.log/" "$file" >"$file.$$"
        mv "$file.$$" "$file"
    done' _ {} +

Modern versions of find should allow you to use + instead of \; after -exec, which should be significantly more performant than spawning a new subprocess for each found file.

Notice how a literal dot in a regular expression should properly be backslash-escaped or put in a character class (an unquoted dot is a regex special character which matches any character except newline).

Upvotes: 1

Related Questions