Sean Easter
Sean Easter

Reputation: 869

How can I write a sed, awk or other regex one-liner to concatenate contiguous lines beginning with match?

I have files beginning with values I'd like to match in contiguous groups, then remove the new line characters between them. "Contiguous groups" meaning that we only want to remove newlines between pairs of matching lines. Diffs provide a handy example: Say we wanted to remove new line character between added lines, i.e. all lines beginning with a plus sign +.

Adapting an answer to question gets close, but only groups pairs, instead of continuing to group all the following match lines:

sed '/^+/N;s/\n+/ /' path/to/file.diff

(Note that the input and expected output also concatenates lines beginning with a space. That was a formatting mistake on my part, and being that very helpful answers have been written to answer the intent expressed in the output, I'm leaving as is so as not to invalidate them.)

Example input:

--- some/file/path  2021-02-21 16:33:40.000000000 -0600
+++ another/file/path   2021-02-21 16:33:52.000000000 -0600
@@ -32,7 +32,7 @@
 this
 sentence
-lost
-many
+gained
+several
+other
 words
@@ -91,9 +91,10 @@
 this
 one
-just
-lost
-many

Desired output:

--- some/file/path  2021-02-21 16:33:40.000000000 -0600
+++ another/file/path   2021-02-21 16:33:52.000000000 -0600
@@ -32,7 +32,7 @@
 this sentence
-lost 
-many
+gained several other
 words
@@ -91,9 +91,10 @@
 this one
-just 
-lost 
-many

Upvotes: 1

Views: 90

Answers (4)

David C. Rankin
David C. Rankin

Reputation: 84569

This awk solution stretches the notion of a 1-liner a bit, but it isn't too bad by long one liner standards, e.g.

awk '
    !found && /^+[^+]/ { printf "%s", $0; found=1; next }
    /^[^+]/  { printf (found?"\n%s\n":"%s\n"), $0; found=0; next }
    found    { printf " %s", substr($0,2); next }
             { print }
' file

Example Use/Output

With your input in the file creatively named file, you can select-copy and middle-mouse-paste into an xterm with the file in the current directory and would have:

$ awk '
>     !found && /^+[^+]/ { printf "%s", $0; found=1; next }
>     /^[^+]/  { printf (found?"\n%s\n":"%s\n"), $0; found=0; next }
>     found    { printf " %s", substr($0,2); next }
>              { print }
> ' file
--- some/file/path  2021-02-21 16:33:40.000000000 -0600
+++ another/file/path   2021-02-21 16:33:52.000000000 -0600
@@ -32,7 +32,7 @@
this
sentence
-lost
-many
+gained several other
words
@@ -91,9 +91,10 @@
this
one
-just
-lost
-many

Note: your problem statement discusses just concatenating lines beginning with '+', but your expected output also joins the first two lines after the diff position information. it is unclear if you want one, the other or both?

Upvotes: 1

potong
potong

Reputation: 58463

This might work for you (GNU sed):

sed -E ':a;N;s/^(([+ ]).*)\n\2/\1 /;$!ta;P;D' file

Append the following line.

If the first line begins with + or and the second line does with the same character, remove the newline and the repeated character and replace them by a space.

Repeat the process until a match fails.

Print/delete the first line and repeat.

Upvotes: 2

karakfa
karakfa

Reputation: 67497

something like this should work

$ awk '{p=substr($0,1,1); 
        if(p!=pp && pp!="-") printf "\n"; 
        pp=p; 
        printf "%s%s",$0,p=="-"?"\n":""}' file

--- some/file/path  2021-02-21 16:33:40.000000000 -0600
+++ another/file/path   2021-02-21 16:33:52.000000000 -0600
@@ -32,7 +32,7 @@
 this sentence
-lost
-many
+gained+several+other
 words
@@ -91,9 +91,10 @@
 this one
-just
-lost
-many

Upvotes: 0

M. Nejat Aydin
M. Nejat Aydin

Reputation: 10133

A sed one-liner which will join adjacent lines beginning with a + character:

sed -e ':a' -e '$!N;s/^\(+.*\)\n+/\1 /;ta' -e 'P;D' file

Upvotes: 1

Related Questions