Reputation: 23
I am trying to replace a pattern between the lines of a file.
Specifically, I would like to replace ,\n &
with , &\n
in large and multiple files. This actually moves the symbol & to the previous line. This is very easy with CTR+H, but I found it difficult with sed.
So, the initial file is in the following form:
A +,
& B -,
& C ),
& D +,
& E (,
& F *,
# & G -,
& H +,
& I (,
& J +,
K ?,
The output-desired form is:
A +, &
B -, &
C ), &
D +, &
E (, &
F *, &
# & G -,
H +, &
I (, &
J +,
K ?,
Following previous answered questions on stackoverflow, I tried to convert it with the commands below:
sed ':a;N;$!ba;s/,\n &/&\n /g' file1.txt > file2.txt
sed -i -e '$!N;/&/b1' -e 'P;D' -e:1 -e 's/\n[[:space:]]*/ /' file2.txt
but they fail if the symbol "#" is present in the file.
Is there any way to replace the matched pattern simpler, let's say:
sed -i 's/,\n &/, &\n /g' file
Thank you in advance!
Upvotes: 2
Views: 145
Reputation: 58420
This might work for you (GNU sed):
sed -E '/,$/{:a;N;/#[^\n]*$/ba
s/,((\n.*)*)\n(\s*)&/, \&\1\n\3 /;h;s/(.*)\n.*/\1/p;g;s/.*\n(.*\n)/\1/;D}' file
Form a two line window (but include comments too if necessary).
Format the first line and print it (with comments if found).
Remove all but the last two lines.
Delete the first of the two lines left and repeat.
Upvotes: 1
Reputation: 9855
Assuming that the line
# & G -,
is a commented line which could get uncommented later, it might make sense to handle the &
in this line as well. Not knowing the purpose of the data, this might or might not be useful.
With GNU Awk, the command
awk 'BEGIN { RS=",";ORS="" } { printf "%s%s", ORS, gensub(/(\n[ \t#]*)&/, " \\&\\1 ",1); ORS=RS }' inputfile
will turn the input
A +,
& B -,
& C ),
& D +,
& E (,
& F *,
# & G -,
& H +,
& I (,
& J +,
K ?,
into
A +, &
B -, &
C ), &
D +, &
E (, &
F *, &
# G -, &
H +, &
I (, &
J +,
K ?,
This script will only work correct if the last line is terminated by a newline or if any other character follows the ,
.
Explanation:
RS=","
sets the comma as record separator instead of a newline for input.ORS=""
sets the output record separator to an empty string before the first record.fprintf "%s%s", ORS, gensub(...)
prepends the record separator instead of appending it.gensub
GNU specific substitution function which allows backreferences to matched groups./(\n[ \t#]*)&/
search pattern: The parentheses define a group (1) that consists of a newline \n
followed by any sequence of spaces, tabs or comment characters [ \t#]*
. The group is followed by an &
character." \\&\\1 "
replacement: space followed by &
, followed by captured group (1) (\\1
) and an additional space to replace the removed &
. (The \\&
is necessary to get a literal &
character instead of inserting the whole match.)ORS=RS
sets the output record separator to ,
after the first row. (after every ros, in fact) to prepend a comma before the 2nd and following records. This ensures that the last record which should be a newline will not get a trailing ,
.The version below of the GNU Awk script
will work as expected only if the last line of the input file is not terminated with a newline.
It will create an additional line with a ,
because the last record containing a newline will be terminated by the output record separator ,
.
awk 'BEGIN { RS=ORS="," } { print gensub(/(\n[ \t#]*)&/, " \\&\\1 ",1) }' inputfile
If the input file ends with a newline, the output will be
...
I (, &
J +,
K ?,
,
with no newline after the last ,
.
Upvotes: 2
Reputation: 163342
Using sed
sed -En 'H;${g;s/^\n//;s/((\n *#.*)*)\n +&(.*)/ \&\1\n \3/gmp}' file
Explanation
-E
Enable extended regexp-n
Prevent the default printing of sedH
Append to hold space${
When at the endg
Overwrite what is in the hold space to the pattern spaces/^\n//;
remove the leading newline from the hold spaces/
Start substitute((\n *#.*)*)
Capture group 1, optionally repeat matching a newline and # followed by the rest of the line\n +&(.*)
Match a newline and 1+ spaces, then match &
and capture the rest of the line in group 3/
Substitute with after this \&\1\n \3
The substitution pattern with the capture groups and the escaped &
/
End substitutiongmp
global to replace all occurrences, multiline, print the line that has a substitutionOutput
A +, &
B -, &
C ), &
D +, &
E (, &
F *, &
# & G -,
H +, &
I (, &
J +,
K ?,%
See a bash demo.
Upvotes: 1
Reputation: 29050
If you use GNU sed
and your file does not contain NUL characters (ASCII code 0), you can use its -z
option to process the whole file as one single string, and the multi-line mode of the substitute command (m
flag). The m
flag is not absolutely needed but it simplifies a bit (.
and character classes do not match newlines):
$ sed -Ez ':a;s/((\`|\n)[^#]*,)((\n.*#.*)*)(\n[[:blank:]]*)&/\1 \&\3\5 /gm;ta' file
A +, &
B -, &
C ), &
D +, &
E (, &
F *, &
# & G -,
H +, &
I (, &
J +,
K ?,
This corresponds to your textual specification and to your desired output for the example you show. But it is a bit complicated. Instead of processing lines that end with a newline character it processes sub-strings that begin with a newline character (or the beginning of the file) and end before the next newline character. Let's name these "chunks".
We search for a sequence of chunks in the form AB*C
where:
A
is a chunk (possibly the first) not containing #
. It is matched by (\<backtick>|\n)[^#]*,
which means beginning-of-file-or-newline, followed by any number of characters except newline and #
, followed by a comma.B*
is any number (including none) of chunks containing #
. It is matched by \n.*#.*
which means newline, followed by any number of characters except newline, followed by #
and any number of characters except newline.C
is a chunk starting with a newline, followed by spaces and &
. It is matched by \n[[:blank:]]*&
which means newline, followed by any number of blanks and a &
.If we find such a AB*C
sequence we add a space and a &
at the end of A
, we do not change B*
, and we replace the first &
in C
by a space. And we repeat until no such sequence is found.
Note: if the commas can be followed by blanks before the newline we must take them into account. If you want to keep them:
$ sed -Ez ':a;s/((\`|\n)[^#]*,[[:blank:]]*)((\n.*#.*)*)(\n[[:blank:]]*)&/\1 \&\3\5 /gm;ta' file
Else:
$ sed -Ez ':a;s/((\`|\n)[^#]*,)[[:blank:]]*((\n.*#.*)*)(\n[[:blank:]]*)&/\1 \&\3\5 /gm;ta' file
Upvotes: 1
Reputation: 11227
Using sed
$ sed ':a;N;s/\n \+\(&\) \(.*\)/ \1\n \2/;ba' input_file
A +, &
B -, &
C ), &
D +, &
E (, &
F *,
# & G -, &
H +, &
I (, &
J +,
Upvotes: 2