cigrainger
cigrainger

Reputation: 2226

Regex correct but not working in sed for 2-character words

I've used regex101.com and a few others to check that this is correct and it seems to be. I want to remove all words which are two characters long or less. My current implementation is:

head -n 10 abstracts.txt | sed 's/ [a-zA-Z]{1,2} //g'

And it's just not doing anything. I would like to go from something like this:

This is a short sentence.

To this:

This short sentence.

Thanks for any help.

Upvotes: 0

Views: 209

Answers (4)

potong
potong

Reputation: 58483

This might work for you (GNU sed):

sed -e 's/\b\w\w\?\b\s\+\|\s\+\w\w\?$//g' file

This removes one or two character words and the following spaces throughout a line or the preceeding spaces and one or two character word at the end of a line.

Upvotes: 0

Jotne
Jotne

Reputation: 41460

Just for test, using awk

awk '{for (i=1;i<=NF;i++) if (length($i)<3) $i="";gsub(/  +/," ")}1'
This short sentence.

Upvotes: 0

Tiago Lopo
Tiago Lopo

Reputation: 7959

Don't use empty spaces use \b for word boundaries:

echo 'This is a short sentence' | sed -e 's/\b[a-zA-Z]\{1,2\}\b//g'
This   short sentence

Upvotes: 1

Toto
Toto

Reputation: 91488

Escape the curly brackets and use word boundary:

head -n 10 abstracts.txt | sed 's/ [a-zA-Z]\{1,2\}\b//g'

Upvotes: 3

Related Questions