Virgil Sisoe
Virgil Sisoe

Reputation: 205

regex command line with single-line flag

I would need to use regex in a bash script to substitute text in a file that might be on multiple lines.
I would pass s as flag in other regex engines that I know but I have a hard time for bash.

sed as far as I know doesn't support this feature.
perl it obviously does but I can not make it work in a one liner perl -i -pe 's/<match.+match>//s $file

example text:

DONT_MATCH

<match some text here
    and here
match>

DONT_MATCH

Upvotes: 2

Views: 1003

Answers (3)

potong
potong

Reputation: 58391

This might work for you (GNU sed):

sed '/^<match/{:a;/match>$/!{N;ba};s/.*//}' file

Gather up a collection of lines from one beginning <match to one ending match> and replace them by nothing.

N.B. This will act on all such collections throughout the file and the end-of-file condition will not effect the outcome. To only act on the first, use:

sed '/^<match/{:a;/match>$/!{N;ba};s/.*//;:b;n;bb}' file

To only act on the second such collection use:

sed -E '/^<match/{:a;/match>$/!{N;ba};x;s/^/x/;/^(x{2})$/{x;s/.*//;x};x}' file

The regex /^(x{2})$/ can be tailored to do more intricate matching e.g. /^(x|x{3,6})$/ would match the first and third to sixth collections.

Upvotes: 2

ikegami
ikegami

Reputation: 385685

By default, . doesn't match a line feed. s simply makes . matches any character.

You are reading the file a line at a time, so you can't possibly match something that spans multiple lines. Use -0777 to treat the entire input as a one line.

perl -i -0777pe's/<match.+match>//s' "$file"

Upvotes: 4

John1024
John1024

Reputation: 113834

With GNU sed:

$ sed -z 's/<match.*match>//g' file
DONT_MATCH



DONT_MATCH

With any sed:

$ sed  'H;1h;$!d;x; s/<match.*match>//g' file
DONT_MATCH



DONT_MATCH

Both the above approaches read the whole file into memory. If you have a big file (e.g. gigabytes), you might want a different approach.

Details

With GNU sed, the -z option reads in files with NUL as the record separator. For text files, which never contain NUL, this has the effect of reading the whole file in.

For ordinary sed, the whole file can be read in with the following steps:

  • H - Append current line to hold space
  • 1h - If this is the first line, overwrite the hold space with it
  • $!d - If this is not the last line, delete pattern space and jump to the next line.
  • x - Exchange hold and pattern space to put whole file in pattern space

Upvotes: 1

Related Questions