Reputation: 61

remove newline from end of string in bash - line continuations

I know there are several different ones open and answered but mine is a little different. I am trying to do this in bash.

I have this file:

Line1 asd asd asd \
    asd asd asd \

Line2 asd asd asd \
    asd asd asd \

Line3 asd asd asd \
    asd asd asd \

Line4 asd asd asd \
    asd asd asd \

The ouput I would like is:

Line1 asd asd asd asd asd asd
Line2 asd asd asd asd asd asd
Line3 asd asd asd asd asd asd
Line4 asd asd asd asd asd asd

So that is is easier to read in as a bash loop. What command would allow me to do this?

Thanks in advance.

Upvotes: 1

Answers (5)

mklement0

Reputation: 438123

Note:

The first solution below reflects the OP's specific whitespace-handling requirements; see the bottom for generic line-continuation processing.
The solutions here are POSIX-compliant, so they should work on most Unix-like platforms (verified on OSX and Linux).
The OP's own solution suggests that the input has Windows-style line endings (\r\n). However, given that this wasn't stated in the question, the solutions here only match Unix-style ones (\n). To match \r\n line endings, replace \n with '"$(printf '\r')"'\n (sic), or, in bash, '$'\r''\n in the sed commands below. (With GNU sed you could simply use \r\n, but POSIX sed doesn't recognize \r as an escape sequence).

A corrected version of the OP's own solution that also handles lines ending in \ that precede empty lines correctly.

sed -e ':a' -e '$!{N;ba' -e '}; s/ \\\n[[:blank:]]*/ /g' filename

-e ':a' -e '$!{N;ba' -e '}' is a common sed idiom: a loop that reads all input lines at once into the pattern space (input buffer) - BSD sed requires multiple -e options to make this work (or, alternatively, a multi-line script).
- ^{Note that the sample input precedes the very last newline with \ as well, which is unusual, and would result in that \ NOT getting removed; if you really need to handle this case, insert G; before s/.../.../ above, which effectively appends another newline to the pattern space and therefore causes the last \ to be removed too.}
The text-replacement command s/ \\\n[[:blank:]]*/ /g then operates on all input lines and globally (g) replaces runs of a single space followed by \ ( \\), followed by a newline (\n), followed by any number of spaces and/or tab chars. ([[:blank:]]*), and replaces each such run with a single space ().
In short: <space>\ at the end of a line causes that line to be joined with the next line, after removing the trailing \ and stripping leading whitespace from the next line.

Note:

The following solutions come in both awk and sed flavors.
Generally, the awk solutions are preferable, because they do not read the input all at once, which can be problematic with large files. (Arguably, they are also easier to understand.)
Note that the here-documents used as sample input below use a quoted EOF delimiter (<<'EOF') to preserve the string unmodified; without quoting EOF, the shell's own string-literal processing would parse the embedded line-continuations and join the lines before the commands ever see the string.

Generic line-continuation processing without whitespace handling:

These solutions simply remove \<newline> sequences, and thus join the lines as is, with no separator; this is what read does by default, for instance.

However, these solutions have two advantages over read:

Line-interior \ instances are left alone.
sed and awk are much faster with more than just a few input lines.

`awk` solution:

awk '/\\$/ { printf "%s", substr($0, 1, length($0)-1); next } 1' <<'EOF'
Line1 starts here\
 and ends here.

Line2 starts here, \
 continues here,\
  and ends here.
EOF
Line1 starts here and ends here.

Line2 starts here,  continues here,  and ends here.

/\\$/ matches a \ at the end ($) of a line, signaling line continuation.
substr($0, 1, length($0)-1) removes that trailing \ from the input line, $0.
By using printf "%s", the (modified) current line is printed without a trailing newline, which means that whatever print command comes next will directly append to it, effectively joining the current and the next line.
next finishes processing of the current line.
1 is a common awk idiom that is shorthand for { print }, i.e., for simply printing the input line (with a trailing \n).

`sed` solution:

$ sed -e ':a' -e '$!{N;ba' -e '}; s/\\\n//g' <<'EOF'
Line1 starts here\
 and ends here.

Line2 starts here, \
 continues here,\
  and ends here.
EOF 
Line1 starts here and ends here.

Line2 starts here,  continues here,  and ends here.

Note the two double spaces in the last line, because all whitespace is preserved.

[NOT RECOMMENDED] A pure shell (e.g., `bash`) solution:

The following solution is alluringly simple, but is not fully robust and is a security risk: it can result in the execution of arbitrary commands:

# Store input filename, passed as the 1st argument,
# in variable $file.
file=$1

# Construct a string that results in a valid shell command containing a
# *literal* here-document with *unquoted* EOF delimiter 0x3 - chosen so
# that it doesn't conflict with the input.
#
# When the resulting command is evaluated by `eval`, the *shell itself* 
# performs the desired line-continuation processing, BUT:
# '$'-prefixed tokens in the input, including command substitutions
# ('$(...)' and '`...`'), ARE EXPANDED, therefore:
# CAUTION: Maliciously constructed input can result in
#          execution of arbitrary commands.
eval "cat <<$(printf '\3')
$(cat "$file")"

Generic line-continuation processing with normalization of whitespace:

These solutions normalizes whitespace as follows: any trailing whitespace before \<newline> is removed, as is leading whitespace from the next line; the resulting lines are then joined by a single space.
Whitespace in lines not participating in line continuation is preserved as is. ^{The latter distinguishes these solutions from choroba's Perl solution}

`awk` solution

awk '
  contd { contd=0; sub(/^[[:blank:]]+/, "") } 
  /\\$/ { contd=1; sub(/[[:blank:]]*\\$/, ""); printf "%s ", $0; next } 
  1' <<'EOF'
Line1 starts here   \
      and ends here.
  I am a loner. 
Line3 starts here,   \
      continues here,    \
and ends here.
EOF
Line1 starts here and ends here.
  I am a loner.
Line3 starts here, continues here, and ends here.

Variable contd(which defaults to 0 / false in a Boolean context) is used as a flag to indicate whether the previous line signaled line continuation with a trailing \.
If the flag is set (pattern contd), it is reset right away (although it may get set again below if the line being continued too continues on the next line), and leading whitespace is trimmed from the current line (sub(/^[[:blank:]]+/, "")); note that not specifying a target variable as the 3rd argument implicitly targets the whole input line, $0.
/\\$/ matches a \ at the end ($) of a line, signaling line continuation.
- Therefore, the flag is set (contd=1),
- trailing whitespace before the line-ending \ is removed (sub(/[[:blank:]]*\\$/, "") along with that \ itself,
- and the result is printed with a trailing space, but without a newline, courtesy of printf "%s ".
- next then proceeds to the next input line, without processing further commands for the current line.
1 is a common awk idiom that is shorthand for { print }, i.e., for simply printing the input line (with a trailing \n); note that this print command is reached in two cases:
- Any lines not involved in line continuation, which are printed unmodified.
- Any lines that end a line continuation (form part of a continuation but do not themselves continue on the next line), which are printed with leading whitespace removed, due to the modification performed by the first action.

`sed` solution

$ sed -e ':a' -e '$!{N;ba' -e '}; s/[[:blank:]]*\\\n[[:blank:]]*/ /g' <<'EOF'
Line1 starts here   \
      and ends here.
  I am a loner.
Line3 starts here,   \
      continues here,    \
and ends here.
EOF
Line1 starts here and ends here.
  I am a loner.
Line3 starts here, continues here, and ends here.

Line-ending and line-beginning whitespace is normalized to a single space for lines involved in continuation. Note how the line without a trailing \ is printed unmodified.

Upvotes: 1

Ed Morton

Reputation: 203665

$ awk -v RS= '{gsub(/\s*\\\s*/,"")}1' file
Line1 asd asd asd asd asd asd
Line2 asd asd asd asd asd asd
Line3 asd asd asd asd asd asd
Line4 asd asd asd asd asd asd

Use [[:space:]] instead of \s if you don't have GNU awk.

Note though that any time you write a loop in shell just to manipulate text you have the wrong approach so doing the above in preparation for simplifying a bash read loop is probably a bad idea overall.

Upvotes: 1

Austin

Reputation: 61

EDIT

This command will take away the space,backslash and tab on the next line.

sed ':a;N;$!ba;s/ \\\x0D\x0A\x09/ /g' filename

line1 asd asd asd \
     asd asd asd

line1 asd asd asd asd asd asd

I then can use:

sed '/^[[:space:]]*$/d' filename

to remove uneeded spaces between these file lines

Upvotes: -1

choroba

Reputation: 241918

Perl solution:

perl -pe 's/\\$// and chomp' < input > output

s/// is a substitution. \\ matches the backslash, $ matches end-of-line.
chomp removes a trailing newline, if present.

To also remove the leading whitespace, use

 's/^ +//; s/\\$// and chomp'

^ matches beginning-of-line. + matches one or more spaces.

instead.

Upvotes: 3

Etan Reisner

Reputation: 80931

The bash built-in read supports backslash-continued lines when you don't use -r (Other then when you need exactly this support you always should use -r).

So it should read those lines from a file/etc. just fine. (assuming they don't have other backslash escape sequences in them that need to be preserved.

$ while IFS= read line; do
    echo "[$line]"
done < <(printf 'Line1 asd asd asd \
    asd asd asd \

Line2 asd asd asd \
    asd asd asd \

Line3 asd asd asd \
    asd asd asd \

Line4 asd asd asd \
    asd asd asd \
')
[Line1 asd asd asd     asd asd asd ]
[Line2 asd asd asd     asd asd asd ]
[Line3 asd asd asd     asd asd asd ]

Upvotes: 3

remove newline from end of string in bash - line continuations

Answers (5)

Generic line-continuation processing without whitespace handling:

awk solution:

sed solution:

[NOT RECOMMENDED] A pure shell (e.g., bash) solution:

Generic line-continuation processing with normalization of whitespace:

awk solution

sed solution

Related Questions

`awk` solution:

`sed` solution:

[NOT RECOMMENDED] A pure shell (e.g., `bash`) solution:

`awk` solution

`sed` solution