zsquare
zsquare

Reputation: 10146

regex to filter sentences based on word length

I'm trying to figure out a regex to match strings where the length of each word is less than some value.

E.g., if the value is 6, the regex should match: "this is a test string" and not "this is another test string", because the length of "another" is greater than 6.

Upvotes: 1

Views: 3315

Answers (5)

Sarah Groß
Sarah Groß

Reputation: 10879

[^\s]{5,} should do the trick! It will count any other char than spaces, though, so commas etc will be included unless you add them to the square brackets.

Upvotes: 0

stema
stema

Reputation: 92976

One possibility is to use a negative lookahead

^(?!.*\b\w{7,}\b).+$

See and test it here on Regexr

Here the approach is a different one, basically I accept everything with the ^.+$ part (at least one character because of the +, change it to * if you would like to accept the empty string also).

Then I add an assertion to the expression (?!.*\b\w{7,}\b). This does not match a character but it checks if the assertion is true. This means here, in the whole string there is no part with 7 or more word characters in a row.

(?!...) negative lookahead assertion

\w a word character, depends on your language, at least a-zA-Z and _ . In some languages also all Unicode characters that are a letter or a digit are included in \w. See here for character classes on regular-expression.info

\b is a word boundary, i.e. the change from a word character to a non word character or the other way round.

Upvotes: 0

ynka
ynka

Reputation: 1497

^\w{1,5}(\s+\w{1,5})*$

this should match strings of one or more words of length up to 5

at least in languages in which the {n,m} syntax is allowed, like Java or Perl

Upvotes: 2

Toto
Toto

Reputation: 91385

How about:

^(?:\b\S{1,5}\b\s*)+$

explanation:

^           : start of string
(?:         : start of non capture group
  \b        : word boundary
  \S{1,5}   : one to five non space char
  \b        : word boundary
  \s*       : 0 or more spaces
)+          : end of group one or more times
$           : end of string

Upvotes: 4

user130076
user130076

Reputation:

The exact syntax of the regular expression you're looking for depends on the language you're using, however this is very possible. The following example is in Python:

import re

def matchStringLength(value, string):
  pattern = re.compile('([A-z]{1,%s} )+' % value)
  return pattern.match(string) != None

This should be enough to let you develop a method which meets your requirements fully, the above will fail for strings with numbers, special characters, etc.

Upvotes: 0

Related Questions