Reputation: 43663
I have a corrupted text file in which I need to replace \x20*[\n\r]+
with \xa0
if next line (if exists) does no start with specific pattern DATA\t
. If such line starts with spaces \x20+
, those should be also removed.
Can I do it with sed
? Text file is about 1MB of size.
Data Example:
DATA 132942, "I love you", 2398, "Hi how are you"
DATA 78793, "It is
me", 4322, "My name is Frank"
DATA 24121, "Where
are
you", 52432, "I am
here"
DATA 43242, "End of story", 432432, "The end"
=>
DATA 132942, "I love you", 2398, "Hi how are you"
DATA 78793, "It is me", 4322, "My name is Frank"
DATA 24121, "Where are you", 52432, "I am here"
DATA 43242, "End of story", 432432, "The end"
Upvotes: 1
Views: 986
Reputation: 58351
This might work for you (GNU sed):
sed ':a;$!N;/\nDATA/!s/\s*\n\s*/ /;ta;P;D' file
Upvotes: 1
Reputation: 75458
A way to do it in Ruby:
ruby -e 'puts File.read(ARGV.shift).gsub(/ *\r?\n *(?!DATA[[:space:]])/, " ").gsub(/ +$/m, "")' file
Output:
DATA 132942, "I love you", 2398, "Hi how are you"
DATA 78793, "It is me", 4322, "My name is Frank"
DATA 24121, "Where are you", 52432, "I am here"
DATA 43242, "End of story", 432432, "The end"
Upvotes: 1
Reputation: 50167
cat input.txt | sed '{:q;N;s/\x20*[\n\r]\+/\xa0/g;t q}' | sed 's/\xa0DATA/\nDATA/g'
Upvotes: 1