HenryM
HenryM

Reputation: 111

bash how to replace/delete beginning of line followed by white space

I'm parsing long texts and part of the byproduct of previous commands is: 1) lines with only one white space 2) line that have one white space followed by a sentence.

How do I get rid of them?

I have tried the following:

tr -s [:space:] |sed -r 's/\^ /\^/g' > output.txt

and the following

tr -s [:space:] |sed -r 's/\n //g' > output.txt

and the following

sed 's/\([.!?]\)[[:space:]]*/\1\n/g' file > output.txt

No success.

Sample Input (underlines represent spaces for better visual understanding)

_Sir_William_Blackstone,
_
_Commentaries_on_the

Sample output

Sir_William_Blackstone,
Commentaries_on_the

Upvotes: 0

Views: 118

Answers (1)

Paul Hodges
Paul Hodges

Reputation: 15246

I'd use sed.

sed -E '/^\s*$/d; s/^\s*//;' < in > out

This deletes lines with only whitespace, and strips whitespace off the beginning of other lines.

c.f. https://www.gnu.org/software/sed/manual/sed.html

There are refinements, but this is the general idea.

Upvotes: 1

Related Questions