Reputation: 69

mutliline regex concatenation

I have a tough one here which I am struggling with. It involves a case of multiline search-replace and/or concatenation situation. Here is my input text:

//                                                                                                            
import tset flash_read, flash_writ;
vector  ( $tset     , (XMOSI, XMISO, XSCLK, XSTRMSTRT, XSTRMSCLK, XSTRMCKEN, XXTALIN, XXTALCPUEN, XHVREGON, XFDRESET, XGLDATA5, XGLDATA4, XGLDATA3, XGLDATA2, XGLDATA1, XGLDATA0):H, (XSTRMD3, XSTRMD2, XSTRMD1, XSTRMD0, XNSS3, XNSS2, XNSS1, XNSS0):H, XTECLOCK, XRXDATA, XRXENABLE, XTXDATA, XTXENABLE, XNRESET, XTCK, XTMS, XTDI, XTDO, XNTRST)
{
repeat 2 
 > flash_writ X0X00X0XXXXXXXXX 0000XXXX X 0 L X X 0 1 0 0 X 0; // XNTRST
repeat 9 
 > flash_writ X0X00X0XXXXXXXXX 0000XXXX X 0 L X X 0 1 1 0 X 1; // Test Logic Reset
 > flash_writ X0X00X0XXXXXXXXX 0000XXXX X 0 L X X 0 1 0 0 X 1; // Run Test Idle
repeat 2 
 > flash_writ X0X00X0XXXXXXXXX 0000XXXX X 0 L X X 0 1 1 0 X 1; // Select IR

My desired output format is this:

//                                                                                                            
import tset flash_read, flash_writ;
vector  ( $tset     , (XMOSI, XMISO, XSCLK, XSTRMSTRT, XSTRMSCLK, XSTRMCKEN, XXTALIN, XXTALCPUEN, XHVREGON, XFDRESET, XGLDATA5, XGLDATA4, XGLDATA3, XGLDATA2, XGLDATA1, XGLDATA0):H, (XSTRMD3, XSTRMD2, XSTRMD1, XSTRMD0, XNSS3, XNSS2, XNSS1, XNSS0):H, XTECLOCK, XRXDATA, XRXENABLE, XTXDATA, XTXENABLE, XNRESET, XTCK, XTMS, XTDI, XTDO, XNTRST)
{
repeat 2              > flash_writ X0X00X0XXXXXXXXX 0000XXXX X 0 L X X 0 1 0 0 X 0; // XNTRST
repeat 9              > flash_writ X0X00X0XXXXXXXXX 0000XXXX X 0 L X X 0 1 1 0 X 1; // Test Logic Reset
                      > flash_writ X0X00X0XXXXXXXXX 0000XXXX X 0 L X X 0 1 0 0 X 1; // Run Test Idle
repeat 2              > flash_writ X0X00X0XXXXXXXXX 0000XXXX X 0 L X X 0 1 1 0 X 1; // Select IR

I am looking for a unix one liner that would search for lines that contain repeat in the input text and replace the new line character at the end of repeat count with a space such that the end outcome looks like the repeat line being concatenated with its next line as illustrated in the output text with the indicated number of white spaces.

For lines that do not contain a repeat count, it is just a matter of pushing the commencement of the line to as many spaces as illustrated in the output text.

Some of the areas where I have explored to accomplish this but with futile finishes are (1) Sed with usage of branch labels, N, pattern space (2) AWK with changing the RS (3) Perl with s/// and multiline flag turned on

Granted that this could be done with nested regex if conditions in a full-fledged perl or python script but I am looking for a more elegant solution.

Upvotes: 1

Answers (2)

dawg

Reputation: 104062

In perl:

perl -0777 -lne 's/^(repeat[ ]+\d+)\s+/\1\t/mg; s/^[ ]*>/\t\t>/mg; print' file 
//                                                                                                            
import tset flash_read, flash_writ;
vector  (      , (XMOSI, XMISO, XSCLK, XSTRMSTRT, XSTRMSCLK, XSTRMCKEN, XXTALIN, XXTALCPUEN, XHVREGON, XFDRESET, XGLDATA5, XGLDATA4, XGLDATA3, XGLDATA2, XGLDATA1, XGLDATA0):H, (XSTRMD3, XSTRMD2, XSTRMD1, XSTRMD0, XNSS3, XNSS2, XNSS1, XNSS0):H, XTECLOCK, XRXDATA, XRXENABLE, XTXDATA, XTXENABLE, XNRESET, XTCK, XTMS, XTDI, XTDO, XNTRST)
{
repeat 2    > flash_writ X0X00X0XXXXXXXXX 0000XXXX X 0 L X X 0 1 0 0 X 0; // XNTRST
repeat 9    > flash_writ X0X00X0XXXXXXXXX 0000XXXX X 0 L X X 0 1 1 0 X 1; // Test Logic Reset
            > flash_writ X0X00X0XXXXXXXXX 0000XXXX X 0 L X X 0 1 0 0 X 1; // Run Test Idle
repeat 2    > flash_writ X0X00X0XXXXXXXXX 0000XXXX X 0 L X X 0 1 1 0 X 1; // Select IR

Or, you can also do:

perl -0777 -lpe 's/^(repeat[ ]+\d+)\s+/\1\t/mg; s/^[ ]*>/\t\t>/mg;' file

You may need to play with how many \t in the second substitution, but you get the idea.

Ed's awk is brilliant. You can also do something like that in perl:

perl -lne ' if (/^repeat[\h]+\d+/) {$ll=$_; next} 
            if (/^\h+>/) {$_=sprintf("%-21s%s",$ll,$_);$ll="";}
            print' file

Upvotes: 2

Ed Morton

Reputation: 204258

$ awk '
    /^repeat/ { pfx = $0; next }
    /^ >/     { $0 = sprintf("%-21s%s", pfx, $0); pfx="" }
    { print }
' file
//
import tset flash_read, flash_writ;
vector  ( $tset     , (XMOSI, XMISO, XSCLK, XSTRMSTRT, XSTRMSCLK, XSTRMCKEN, XXTALIN, XXTALCPUEN, XHVREGON, XFDRESET, XGLDATA5, XGLDATA4, XGLDATA3, XGLDATA2, XGLDATA1, XGLDATA0):H, (XSTRMD3, XSTRMD2, XSTRMD1, XSTRMD0, XNSS3, XNSS2, XNSS1, XNSS0):H, XTECLOCK, XRXDATA, XRXENABLE, XTXDATA, XTXENABLE, XNRESET, XTCK, XTMS, XTDI, XTDO, XNTRST)
{
repeat 2              > flash_writ X0X00X0XXXXXXXXX 0000XXXX X 0 L X X 0 1 0 0 X 0; // XNTRST
repeat 9              > flash_writ X0X00X0XXXXXXXXX 0000XXXX X 0 L X X 0 1 1 0 X 1; // Test Logic Reset
                      > flash_writ X0X00X0XXXXXXXXX 0000XXXX X 0 L X X 0 1 0 0 X 1; // Run Test Idle
repeat 2              > flash_writ X0X00X0XXXXXXXXX 0000XXXX X 0 L X X 0 1 1 0 X 1; // Select IR

or if you prefer brevity over clarity:

awk '/^repeat/{p=$0;next} /^ >/{$0=sprintf("%-21s",p)$0;p=""} 1' file

and if you want "in place" editing then use GNU awk:

awk -i inplace '/^repeat/{p=$0;next} /^ >/{$0=sprintf("%-21s",p)$0;p=""} 1' file

Upvotes: 1

mutliline regex concatenation

Answers (2)

Related Questions