econ
econ

Reputation: 547

OSX sed newlines - why conversion of whitespace to newlines works, but newlines are not converted to spaces

sed on OSX has some quirks. This resource (http://nlfiedler.github.io/2010/12/05/newlines-in-sed-on-mac.html) contains information on how to convert whitespace into a newline:

 echo 'foo bar baz quux' | sed -e 's/ /\'$'\n/g'

OR (@ghoti's suggestion which does make it easier to read):

echo 'foo bar baz quux' |  sed -e $'s/ /\\\n/g'

However, when I try the reverse - converting newlines to whitespace, it doesn't work:

echo -e "foo\nbar" | sed -e 's/\'$'\n/ /g'

A more straightforward approach of just changing \n doesn't work either:

echo -e "foo\nbar" | sed -e 's/\n/ /g'

There's a related answer here: https://superuser.com/questions/307165/newlines-in-sed-on-mac-os-x, with a detailed answer by Spiff (right at the end of the page), however applying the same logic didn't resolve the problem.

Here's one way that does work on OSX (via http://www.benjiegillam.com/2011/09/using-sed-to-replace-newlines/):

 sed -e ':a' -e 'N' -e '$!ba' -e 's/\n/ /g'

However, I am still curious why reversing the original approach doesn't work.

UPDATE: here's how to make it work with two lines (the solution is to use N to embed the newline characters):

echo -e "foo\nbar\n" | sed -e 'N;s/\n/ /g'

AN ALTERNATIVE SOLUTION (see full answer by @ghoti for detailed explanation):

echo -e "foo\nbar\n" | sed -n '1h;2,$H;${;x;s/\n/ /gp;}'

However, this solution appears to be a tiny bit slower than the one suggested in the question statement (note order of these commands matters, so it might make sense to try testing them in different orders):

time seq 10000 | sed -n '1h;2,$H;${;x;s/\n/ /gp;}' > /dev/null

time seq 10000 | sed -e ':a' -e 'N' -e '$!ba' -e 's/\n/ /g' > /dev/null

Upvotes: 1

Views: 1898

Answers (3)

ghoti
ghoti

Reputation: 46856

Your question appears to be "why doesn't the reverse of the original approach [of converting spaces to newlines] work?".

In sed, the newline is more of a record separator than part of the line. Consider that $, the null at the end of the pattern space, comes after the last character of the line, and is not a newline of every line.

Sed commands that utilize newlines, like H and N and even s, do so outside the scope of newline-as-record-separator. The records you're substituting are between the newlines.

In order to substitute a newline, then, you need to get it INSIDE the pattern space, using N, H, etc.

So here's an option.

printf 'foo\nbar\nbaz\n' | sed -n '1h;2,$H;${;x;s/\n/ /gp;}'

The idea is that we'll append all our lines to the hold buffer, then at the end of the file, move the hold buffer back to the pattern space for substitution, and replace the newlines with spaces all at once.

The 1h;2,$H construction avoids a blank at the beginning of your output, caused by the newline that is appended before each line of data with H.

Upvotes: 1

Mark Setchell
Mark Setchell

Reputation: 207550

A couple of alternatives, that I tend to fall back on when stymied by OSX sed peculiarities, are tr and perl.

echo -e "foo\nbar" | tr '\n' ' '
foo bar

echo -e "foo\nbar" | perl -pe 's/\n/ /'
foo bar

Upvotes: 0

Jonathan Leffler
Jonathan Leffler

Reputation: 754150

The GNU manual page for sed includes:

REGULAR EXPRESSIONS

POSIX.2 BREs should be supported, but they aren't completely because of performance problems. The \n sequence in a regular expression matches the newline character, and similarly for \a, \t, and other sequences.

The Mac OS X manual page for sed includes:

Sed Regular Expressions

The regular expressions used in sed, by default, are basic regular expressions (BREs, see re_format(7) for more information), but extended (modern) regular expressions can be used instead if the -E flag is given. In addition, sed has the following two additions to regular expressions:

  1. In a context address, any character other than a backslash (\) or newline character may be used to delimit the regular expression. Also, putting a backslash character before the delimiting character causes the character to be treated literally. For example, in the context address \xabc\xdefx, the RE delimiter is an x and the second x stands for itself, so that the regular expression is abcxdef.

  2. The escape sequence \n matches a newline character embedded in the pattern space. You cannot, however, use a literal newline character in an address or in the substitute command.

What these don't say, but what seems to be the case, is that in the s/regex/new/ command, the regex section is a regular expression, but the new section is not. In the replacement material, you have to use \ followed by a newline to embed a newline. In the search material (regex), you can use \n.

Note also that sed works on lines. By default, the newline at the end of the pattern space is pretty much unmatchable except with the regex metacharacter $; you can't simply remove that newline by matching it. You can, however, end up with multiple lines in the pattern space, and then you can match embedded newlines with the \n pattern.

Upvotes: 1

Related Questions