Reputation: 111
I'm parsing long texts and part of the byproduct of previous commands is: 1) lines with only one white space 2) line that have one white space followed by a sentence.
How do I get rid of them?
I have tried the following:
tr -s [:space:] |sed -r 's/\^ /\^/g' > output.txt
and the following
tr -s [:space:] |sed -r 's/\n //g' > output.txt
and the following
sed 's/\([.!?]\)[[:space:]]*/\1\n/g' file > output.txt
No success.
Sample Input (underlines represent spaces for better visual understanding)
_Sir_William_Blackstone,
_
_Commentaries_on_the
Sample output
Sir_William_Blackstone,
Commentaries_on_the
Upvotes: 0
Views: 118
Reputation: 15246
I'd use sed
.
sed -E '/^\s*$/d; s/^\s*//;' < in > out
This deletes lines with only whitespace, and strips whitespace off the beginning of other lines.
c.f. https://www.gnu.org/software/sed/manual/sed.html
There are refinements, but this is the general idea.
Upvotes: 1