Reputation: 2226
I've used regex101.com and a few others to check that this is correct and it seems to be. I want to remove all words which are two characters long or less. My current implementation is:
head -n 10 abstracts.txt | sed 's/ [a-zA-Z]{1,2} //g'
And it's just not doing anything. I would like to go from something like this:
This is a short sentence.
To this:
This short sentence.
Thanks for any help.
Upvotes: 0
Views: 209
Reputation: 58483
This might work for you (GNU sed):
sed -e 's/\b\w\w\?\b\s\+\|\s\+\w\w\?$//g' file
This removes one or two character words and the following spaces throughout a line or the preceeding spaces and one or two character word at the end of a line.
Upvotes: 0
Reputation: 41460
Just for test, using awk
awk '{for (i=1;i<=NF;i++) if (length($i)<3) $i="";gsub(/ +/," ")}1'
This short sentence.
Upvotes: 0
Reputation: 7959
Don't use empty spaces use \b
for word boundaries:
echo 'This is a short sentence' | sed -e 's/\b[a-zA-Z]\{1,2\}\b//g'
This short sentence
Upvotes: 1
Reputation: 91488
Escape the curly brackets and use word boundary:
head -n 10 abstracts.txt | sed 's/ [a-zA-Z]\{1,2\}\b//g'
Upvotes: 3