Docslayer
Docslayer

Reputation: 21

Regular Expression to remove all lines containing only 1 word

I am attempting to create a regex that will find any line that contains exactly one word on it. Words separated by a hyphen or symbol (e.g test-word) or leading white space should still be treated as a single word.

$cat file1
this line has many words
hello
  test-hi
this does aswell

Using the regular expression

'/^\s*(\w+)\s$/GM'

Returns only "hello" and ignores "test-hi"

I am able to capture all single words but not ones with hyphens etc!

Upvotes: 1

Views: 555

Answers (3)

Sundeep
Sundeep

Reputation: 23677

This is easier to do with awk, by default it will separate each record into fields based on one or more continuous whitespaces and whitespace at start/end of line won't be part of field calculations

$ awk 'NF==1' ip.txt
hello
  test-hi
$ awk 'NF>1' ip.txt
this line has many words
this does aswell

NF is a built-in variable that indicates number of fields in the input record

Upvotes: 4

Tim Biegeleisen
Tim Biegeleisen

Reputation: 522151

Try using \S to match any non whitespace character:

'/^\s*(\S+)\s$/GM'

Upvotes: 1

ggorlen
ggorlen

Reputation: 57155

You can use

^\s*([\w-]+)\s*$

which adds support for hyphens, makes the second \s match "zero or more" spaces. Keep your GM flags.

Demo

Upvotes: 1

Related Questions