Reputation: 175
I have a source file containing 15 columns and some columns have a new line character within the data and I need to delete them preserving the record delimiter which also happens to be the new line character.
Have seen a solution
sed -e :a -e '$!N; s/ *\n\([^"]\)/ \1/; ta' -e 'P;D' file
in the post below
Need to selectively remove newline characters from a file using unix (solaris)
but cant comment there as I don't have enough reputation.
Could some one help me understand the sed command ?
Thanks
Upvotes: 0
Views: 328
Reputation: 10039
another post (for clarity, the other is explanation of your sed)
sed -e ':a
s/\([^,]*,\)\{14\}\(.*\)/&/
t rmNL
N
b a
: rmNL
s/\n//g' YourFile
Try this BUT, there could be no new line in 15th field (because there is no way to know if it is part of filed or real new record or content of this new record first line)
load new line until there are 15 separated content by a ,
, then remove any NewLine inside
Upvotes: 0
Reputation: 10039
-e
action list in string after this
:a
define a label (for a goto jump)
$!N
if not last line, load a new line to the working buffer (add a line to treat for next action) (if on last line, skip and go to next action)
s/ *\n\([^"]\)/ \1/
replace space (any number of) + newline followed by anything but "
(any number of) [keep this in memory n°1] by space + content of memory
ta
if there is a replacment, go to label a
(restart the cycle until now)
P
Print first line of current working buffer
D
Delete first line and go to end of action list (so load a new line into buffer and restart if not at last line)
so this sed will remove any New Line that are under certain sequence (in fact dur to use of * in the sequence, it will remove any New Line of the file followed by something that is not "
that is certainly your field content end delimiter
Upvotes: 1