Shell remove string including newlines

Question

I am currently working on a custom source patcher and I'm having troubles with replacing string by another, including newlines.

For instance, I want to remove this pattern :


/* @patch[...]*/

In order to get this... :

this.is = code ;
/* @patch beta
    blah blah
*/
if (!this.is) return 0 ;
/* @patch end */

... to this :

this.is = code ;
if (!this.is) return 0 ;

And not this :

this.is = code ;
<- newline
if (!this.is) return 0 ;
<- newline

Using a shell script, I'm using sed command in order to do what I want :

sed -e "s|/\* @patch.*\*/||g" $file > $file"_2"

This works pretty well, but the newlines are still there.

This way doesn't work as sed can't parse newlines :

sed -e "s|
/\* @patch.*\*/||g" $file > $file"_2"

Neither this method work : How can I replace a newline ( ) using sed? , nor tr (second answer on the same thread).

Would you have any solution to this ? Even heavy ones, performance is not important here.

P.S. : I am working on a web application, and in this case JavaScript files. Under Mac OS X Yosemite, but no matter what system I'm using, it seems to be a common issue for all bash users.

I found out another solution using Node.js for those who have troubles with their Awk version :

node -e "console.log(process.argv[1].replace(/[

]/\* @patch([\s\S]*?)\*//mg, ''))" "`cat $filepath`"

Ed Morton · Accepted Answer

sed is for simple substitutions on individual lines, for anything else you should be using awk:

$ awk -v RS='^$' -v ORS= '{gsub(/[*][/]/,"\0"); gsub(/
[/][*] @patch[^\0]+\0/,""); gsub(/\0/,"*/")} 1' file
this.is = code ;
if (!this.is) return 0 ;

The above uses GNU awk for multi-char RS to read the whole file as a single string (with other ask you just build up the string line by line and process in the END section) and relies on your file not containing any NUL (\0) characters.

The first gsub() changes every */ to one char (a NUL) so the 2nd gsub() can negate it in a bracket expression as part of your desired regexp and then the third gsub() restores any remaining NULs to */s.

With non-gawk you need to build up the string:

awk '{rec = rec $0 RS} END{gsub(/[*][/]/,"\0",rec); gsub(/
[/][*] @patch[^\0]+\0/,"",rec); gsub(/\0/,"*/",rec); printf "%s",rec}' file

and it sounds like your awk requires the /s in the bracket expressions escaped so it doesn't see them as the terminating char of the RE:

awk '{rec = rec $0 RS} END{gsub(/[*][/]/,"\0",rec); gsub(/
[/][*] @patch[^\0]+\0/,"",rec); gsub(/\0/,"*/",rec); printf "%s",rec}' file

If your awk doesn't like NUL chars then use some control character, e.g. (where every ^C is a literal control-C character):

awk '{rec = rec $0 RS} END{gsub(/[*][/]/,"^C",rec); gsub(/
[/][*] @patch[^^C]+^C/,"",rec); gsub("^C","*/",rec); printf "%s",rec}' file

or use the pre-defined SUBSEP control char that awk uses to separate array indices (note you now need to double-up the backslashes in the REs that are concatenation of literal strings with SUBSEPs since they are now dynamic regexps instead of constant regexps, see http://www.gnu.org/software/gawk/manual/gawk.html#Computed-Regexps for details):

awk '{rec = rec $0 RS} END{gsub(/[*][/]/,SUBSEP,rec); gsub("\n[\/][*] @patch[^"SUBSEP"]+"SUBSEP,"",rec); gsub(SUBSEP,"*/",rec); printf "%s",rec}' file

Shell remove string including newlines

Answers (1)

Related Questions