JohnFreeman1212
JohnFreeman1212

Reputation: 145

How do I extract a string that spans multiple lines with sed?

I need to extract the string between CAKE_FROSTING(" and ",. If the string extends over multiple lines, the quotation marks and newline at the line changes must be removed. I have a command (thanks stackoverflow) that does something in that direction, but not exactly. How can I fix it (and can you shortly explain the fixes)? I am using Linux bash.

sed -En ':a;N;s/.*CAKE_FROSTING\(\n?\s*?"([^,]*).*/\1/p;ba' filesToCheck/* > result.txt

filesToCheck/file.h

something
CAKE_FROSTING(
"is supreme", 
"[i][agree]") something else
something more
something else
CAKE_FROSTING(
"is."kinda" neat"
"in fact", 
"[i][agree]") something else
something more

result.txt current

is supreme"
is."kinda" neat"

result.txt desired

is supreme
is."kinda" neat in fact

Edit: With help from @D_action I now have

sed -En ':a;N;s/.*CAKE_FROSTING\(\n?\s*?"([^,]*).*,/\1/p;ba' filesToCheck/* > result.txt

this produces almost the correct output, but there are unnecessary quotation marks and one too many newline in the output:

result.txt current

is supreme" 
is."kinda" neat"
"in fact" 

Upvotes: 1

Views: 154

Answers (3)

potong
potong

Reputation: 58420

This might work for you (GNU sed):

sed '/^CAKE_FROSTING($/!d;z;:a;N;s/^"\([^[].*\)".*/\1/mg;ta;s/^.\(.*\)\n.*/\1/;y/\n/ /' file

Focus on those lines that contain CAKE_FROSTING( and delete all others.

Having established the starting point, zap that line and then gather up following lines until one beginning "[, trimming the lines as we go.

Remove the initial newline and the unwanted last line.

Then replace any remaining newlines with spaces.

Upvotes: 2

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626826

You can also use perl here to match string between CAKE_FROSTING( and ) and remove double quotes from start/end of lines and replace linebreaks with spaces only inside the matches:

perl -0777 -ne 'while (/CAKE_FROSTING\(\s*"([^,]*)"/g) {$a=$1; $a =~ s/^"|"$|(\R+)/$1?" ":""/gme; print "$a\n"}' file

See the online demo. Note that -0777 slurps the file so that the regex engine could "see" the line breaks.

The CAKE_FROSTING\(\s*"([^,]*)" pattern matches CAKE_FROSTING(, zero or more whitespaces, ", then captures into Group 1 any zero or more non-comma chars until the right-most ".

The $a=$1; $a =~ s/^"|"$|(\R+)/$1?" ":""/gme; print "$a\n" parts assigns the Group 1 value to an $a variable, ^"|"$|(\R+) matches "s that are either at the start of end of lines or captures one or more line breaks (\R+) into Group 1 and if Group 1 matches, the replacement is a space, else, it is an empty string. The contents of the $a variable is printed only.

Upvotes: 1

sseLtaH
sseLtaH

Reputation: 11227

Using GNU sed

$ sed -En ':a;N;s/.*CAKE_FROSTING\(\n?\s"([^"]*[^\n,]*)["].*\n"([[:alpha:] ]+)?.*/\1 \2/p;ba' input_file
is supreme
is."kinda" neat in fact

Upvotes: 5

Related Questions