Reputation: 547
sed on OSX has some quirks. This resource (http://nlfiedler.github.io/2010/12/05/newlines-in-sed-on-mac.html) contains information on how to convert whitespace into a newline:
echo 'foo bar baz quux' | sed -e 's/ /\'$'\n/g'
OR (@ghoti's suggestion which does make it easier to read):
echo 'foo bar baz quux' | sed -e $'s/ /\\\n/g'
However, when I try the reverse - converting newlines to whitespace, it doesn't work:
echo -e "foo\nbar" | sed -e 's/\'$'\n/ /g'
A more straightforward approach of just changing \n
doesn't work either:
echo -e "foo\nbar" | sed -e 's/\n/ /g'
There's a related answer here: https://superuser.com/questions/307165/newlines-in-sed-on-mac-os-x, with a detailed answer by Spiff (right at the end of the page), however applying the same logic didn't resolve the problem.
Here's one way that does work on OSX (via http://www.benjiegillam.com/2011/09/using-sed-to-replace-newlines/):
sed -e ':a' -e 'N' -e '$!ba' -e 's/\n/ /g'
However, I am still curious why reversing the original approach doesn't work.
UPDATE: here's how to make it work with two lines (the solution is to use N
to embed the newline characters):
echo -e "foo\nbar\n" | sed -e 'N;s/\n/ /g'
AN ALTERNATIVE SOLUTION (see full answer by @ghoti for detailed explanation):
echo -e "foo\nbar\n" | sed -n '1h;2,$H;${;x;s/\n/ /gp;}'
However, this solution appears to be a tiny bit slower than the one suggested in the question statement (note order of these commands matters, so it might make sense to try testing them in different orders):
time seq 10000 | sed -n '1h;2,$H;${;x;s/\n/ /gp;}' > /dev/null
time seq 10000 | sed -e ':a' -e 'N' -e '$!ba' -e 's/\n/ /g' > /dev/null
Upvotes: 1
Views: 1898
Reputation: 46856
Your question appears to be "why doesn't the reverse of the original approach [of converting spaces to newlines] work?".
In sed, the newline is more of a record separator than part of the line. Consider that $
, the null at the end of the pattern space, comes after the last character of the line, and is not a newline of every line.
Sed commands that utilize newlines, like H
and N
and even s
, do so outside the scope of newline-as-record-separator. The records you're substituting are between the newlines.
In order to substitute a newline, then, you need to get it INSIDE the pattern space, using N
, H
, etc.
So here's an option.
printf 'foo\nbar\nbaz\n' | sed -n '1h;2,$H;${;x;s/\n/ /gp;}'
The idea is that we'll append all our lines to the hold buffer, then at the end of the file, move the hold buffer back to the pattern space for substitution, and replace the newlines with spaces all at once.
The 1h;2,$H
construction avoids a blank at the beginning of your output, caused by the newline that is appended before each line of data with H
.
Upvotes: 1
Reputation: 207550
A couple of alternatives, that I tend to fall back on when stymied by OSX sed
peculiarities, are tr
and perl
.
echo -e "foo\nbar" | tr '\n' ' '
foo bar
echo -e "foo\nbar" | perl -pe 's/\n/ /'
foo bar
Upvotes: 0
Reputation: 754150
The GNU manual page for sed
includes:
REGULAR EXPRESSIONS
POSIX.2 BREs should be supported, but they aren't completely because of performance problems. The
\n
sequence in a regular expression matches the newline character, and similarly for\a
,\t
, and other sequences.
The Mac OS X manual page for sed
includes:
Sed Regular Expressions
The regular expressions used in
sed
, by default, are basic regular expressions (BREs, see re_format(7) for more information), but extended (modern) regular expressions can be used instead if the-E
flag is given. In addition,sed
has the following two additions to regular expressions:
In a context address, any character other than a backslash (
\
) or newline character may be used to delimit the regular expression. Also, putting a backslash character before the delimiting character causes the character to be treated literally. For example, in the context address\xabc\xdefx
, the RE delimiter is anx
and the secondx
stands for itself, so that the regular expression isabcxdef
.The escape sequence
\n
matches a newline character embedded in the pattern space. You cannot, however, use a literal newline character in an address or in the substitute command.
What these don't say, but what seems to be the case, is that in the s/regex/new/
command, the regex
section is a regular expression, but the new
section is not. In the replacement material, you have to use \
followed by a newline to embed a newline. In the search material (regex
), you can use \n
.
Note also that sed
works on lines. By default, the newline at the end of the pattern space is pretty much unmatchable except with the regex metacharacter $
; you can't simply remove that newline by matching it. You can, however, end up with multiple lines in the pattern space, and then you can match embedded newlines with the \n
pattern.
Upvotes: 1