Ωmega
Ωmega

Reputation: 43663

Conditional replacement of new line character with sed

I have a corrupted text file in which I need to replace \x20*[\n\r]+ with \xa0 if next line (if exists) does no start with specific pattern DATA\t. If such line starts with spaces \x20+, those should be also removed.

Can I do it with sed? Text file is about 1MB of size.


Data Example:

DATA     132942, "I love you", 2398, "Hi how are you"
DATA     78793, "It is 
me", 4322, "My name is Frank"
DATA     24121, "Where
   are
you", 52432, "I am

here"
DATA     43242, "End of story", 432432, "The end"

=>

DATA     132942, "I love you", 2398, "Hi how are you"
DATA     78793, "It is me", 4322, "My name is Frank"
DATA     24121, "Where are you", 52432, "I am here"
DATA     43242, "End of story", 432432, "The end"

Upvotes: 1

Views: 986

Answers (3)

potong
potong

Reputation: 58351

This might work for you (GNU sed):

sed ':a;$!N;/\nDATA/!s/\s*\n\s*/ /;ta;P;D' file

Upvotes: 1

konsolebox
konsolebox

Reputation: 75458

A way to do it in Ruby:

ruby -e 'puts File.read(ARGV.shift).gsub(/ *\r?\n *(?!DATA[[:space:]])/, " ").gsub(/ +$/m, "")' file

Output:

DATA    132942, "I love you", 2398, "Hi how are you"
DATA    78793, "It is me", 4322, "My name is Frank"
DATA    24121, "Where are you", 52432, "I am here"
DATA    43242, "End of story", 432432, "The end"

Upvotes: 1

mVChr
mVChr

Reputation: 50167

cat input.txt | sed '{:q;N;s/\x20*[\n\r]\+/\xa0/g;t q}' | sed 's/\xa0DATA/\nDATA/g'

Upvotes: 1

Related Questions