Reputation: 61
I know there are several different ones open and answered but mine is a little different. I am trying to do this in bash.
I have this file:
Line1 asd asd asd \
asd asd asd \
Line2 asd asd asd \
asd asd asd \
Line3 asd asd asd \
asd asd asd \
Line4 asd asd asd \
asd asd asd \
The ouput I would like is:
Line1 asd asd asd asd asd asd
Line2 asd asd asd asd asd asd
Line3 asd asd asd asd asd asd
Line4 asd asd asd asd asd asd
So that is is easier to read in as a bash loop. What command would allow me to do this?
Thanks in advance.
Upvotes: 1
Views: 1819
Reputation: 438123
Note:
\r\n
). However, given that this wasn't stated in the question, the solutions here only match Unix-style ones (\n
). To match \r\n
line endings, replace \n
with '"$(printf '\r')"'\n
(sic), or, in bash
, '$'\r''\n
in the sed
commands below. (With GNU sed you could simply use \r\n
, but POSIX sed
doesn't recognize \r
as an escape sequence). A corrected version of the OP's own solution that also handles lines ending in \
that precede empty lines correctly.
sed -e ':a' -e '$!{N;ba' -e '}; s/ \\\n[[:blank:]]*/ /g' filename
-e ':a' -e '$!{N;ba' -e '}'
is a common sed
idiom: a loop that reads all input lines at once into the pattern space (input buffer) - BSD sed
requires multiple -e
options to make this work (or, alternatively, a multi-line script).
\
as well, which is unusual, and would result in that \
NOT getting removed; if you really need to handle this case, insert G;
before s/.../.../
above, which effectively appends another newline to the pattern space and therefore causes the last \
to be removed too.s/ \\\n[[:blank:]]*/ /g
then operates on all input lines and globally (g
) replaces runs of a single space followed by \
(
\\
), followed by a newline (\n
), followed by any number of spaces and/or tab chars. ([[:blank:]]*
), and replaces each such run with a single space (
).<space>\
at the end of a line causes that line to be joined with the next line, after removing the trailing \
and stripping leading whitespace from the next line.Note:
awk
and sed
flavors.awk
solutions are preferable, because they do not read the input all at once, which can be problematic with large files. (Arguably, they are also easier to understand.)<<'EOF'
) to preserve the string unmodified; without quoting EOF
, the shell's own string-literal processing would parse the embedded line-continuations and join the lines before the commands ever see the string.These solutions simply remove \<newline>
sequences, and thus join the lines as is, with no separator; this is what read
does by default, for instance.
However, these solutions have two advantages over read
:
\
instances are left alone.sed
and awk
are much faster with more than just a few input lines.awk
solution:awk '/\\$/ { printf "%s", substr($0, 1, length($0)-1); next } 1' <<'EOF'
Line1 starts here\
and ends here.
Line2 starts here, \
continues here,\
and ends here.
EOF
Line1 starts here and ends here.
Line2 starts here, continues here, and ends here.
/\\$/
matches a \
at the end ($
) of a line, signaling line continuation.substr($0, 1, length($0)-1)
removes that trailing \
from the input line, $0
.printf "%s"
, the (modified) current line is printed without a trailing newline, which means that whatever print command comes next will directly append to it, effectively joining the current and the next line.next
finishes processing of the current line.1
is a common awk
idiom that is shorthand for { print }
, i.e., for simply printing the input line (with a trailing \n
).sed
solution:$ sed -e ':a' -e '$!{N;ba' -e '}; s/\\\n//g' <<'EOF'
Line1 starts here\
and ends here.
Line2 starts here, \
continues here,\
and ends here.
EOF
Line1 starts here and ends here.
Line2 starts here, continues here, and ends here.
Note the two double spaces in the last line, because all whitespace is preserved.
bash
) solution:The following solution is alluringly simple, but is not fully robust and is a security risk: it can result in the execution of arbitrary commands:
# Store input filename, passed as the 1st argument,
# in variable $file.
file=$1
# Construct a string that results in a valid shell command containing a
# *literal* here-document with *unquoted* EOF delimiter 0x3 - chosen so
# that it doesn't conflict with the input.
#
# When the resulting command is evaluated by `eval`, the *shell itself*
# performs the desired line-continuation processing, BUT:
# '$'-prefixed tokens in the input, including command substitutions
# ('$(...)' and '`...`'), ARE EXPANDED, therefore:
# CAUTION: Maliciously constructed input can result in
# execution of arbitrary commands.
eval "cat <<$(printf '\3')
$(cat "$file")"
These solutions normalizes whitespace as follows: any trailing whitespace before \<newline>
is removed, as is leading whitespace from the next line; the resulting lines are then joined by a single space.
Whitespace in lines not participating in line continuation is preserved as is. The latter distinguishes these solutions from choroba's Perl solution
awk
solutionawk '
contd { contd=0; sub(/^[[:blank:]]+/, "") }
/\\$/ { contd=1; sub(/[[:blank:]]*\\$/, ""); printf "%s ", $0; next }
1' <<'EOF'
Line1 starts here \
and ends here.
I am a loner.
Line3 starts here, \
continues here, \
and ends here.
EOF
Line1 starts here and ends here.
I am a loner.
Line3 starts here, continues here, and ends here.
contd
(which defaults to 0 / false in a Boolean context) is used as a flag to indicate whether the previous line signaled line continuation with a trailing \
.contd
), it is reset right away (although it may get set again below if the line being continued too continues on the next line), and leading whitespace is trimmed from the current line (sub(/^[[:blank:]]+/, "")
); note that not specifying a target variable as the 3rd argument implicitly targets the whole input line, $0
./\\$/
matches a \
at the end ($
) of a line, signaling line continuation.
contd=1
),\
is removed (sub(/[[:blank:]]*\\$/, "")
along with that \
itself,printf "%s "
.next
then proceeds to the next input line, without processing further commands for the current line.1
is a common awk
idiom that is shorthand for { print }
, i.e., for simply printing the input line (with a trailing \n
); note that this print command is reached in two cases:
sed
solution$ sed -e ':a' -e '$!{N;ba' -e '}; s/[[:blank:]]*\\\n[[:blank:]]*/ /g' <<'EOF'
Line1 starts here \
and ends here.
I am a loner.
Line3 starts here, \
continues here, \
and ends here.
EOF
Line1 starts here and ends here.
I am a loner.
Line3 starts here, continues here, and ends here.
Line-ending and line-beginning whitespace is normalized to a single space for lines involved in continuation.
Note how the line without a trailing \
is printed unmodified.
Upvotes: 1
Reputation: 203665
$ awk -v RS= '{gsub(/\s*\\\s*/,"")}1' file
Line1 asd asd asd asd asd asd
Line2 asd asd asd asd asd asd
Line3 asd asd asd asd asd asd
Line4 asd asd asd asd asd asd
Use [[:space:]]
instead of \s
if you don't have GNU awk.
Note though that any time you write a loop in shell just to manipulate text you have the wrong approach so doing the above in preparation for simplifying a bash read loop is probably a bad idea overall.
Upvotes: 1
Reputation: 61
EDIT
This command will take away the space,backslash and tab on the next line.
sed ':a;N;$!ba;s/ \\\x0D\x0A\x09/ /g' filename
line1 asd asd asd \
asd asd asd
to
line1 asd asd asd asd asd asd
I then can use:
sed '/^[[:space:]]*$/d' filename
to remove uneeded spaces between these file lines
Upvotes: -1
Reputation: 241918
Perl solution:
perl -pe 's/\\$// and chomp' < input > output
s///
is a substitution. \\
matches the backslash, $
matches end-of-line.chomp
removes a trailing newline, if present.To also remove the leading whitespace, use
's/^ +//; s/\\$// and chomp'
^
matches beginning-of-line. +
matches one or more spaces.instead.
Upvotes: 3
Reputation: 80931
The bash built-in read
supports backslash-continued lines when you don't use -r
(Other then when you need exactly this support you always should use -r
).
So it should read those lines from a file/etc. just fine. (assuming they don't have other backslash escape sequences in them that need to be preserved.
$ while IFS= read line; do
echo "[$line]"
done < <(printf 'Line1 asd asd asd \
asd asd asd \
Line2 asd asd asd \
asd asd asd \
Line3 asd asd asd \
asd asd asd \
Line4 asd asd asd \
asd asd asd \
')
[Line1 asd asd asd asd asd asd ]
[Line2 asd asd asd asd asd asd ]
[Line3 asd asd asd asd asd asd ]
Upvotes: 3