Alex
Alex

Reputation: 1999

How do I Capture The First N words in a String? Using Regex?

I have a regex that looks like this (all word characters plus underscore and dash):

/[\w\-_]+/gm

And an input that looks like this:

This is a cat. It is fat. That is a dog. It looks like a log. Fat-cat dog_log

It is correctly matching all the words, skipping the whitespace and punctuations. But I only want to get the first 3 words. I thought I could just add {1,3} to the end of the regex and get this result but this gives an error. The regex tester I used can be found here: https://regex101.com/r/Ec1IAH/1

Upvotes: 0

Views: 1453

Answers (4)

Noelio Dutra
Noelio Dutra

Reputation: 1

Try this regex to capture multiple group of n words in a text

([\w\W]+?\s){3}

Upvotes: 0

Kpax7
Kpax7

Reputation: 11

The regex in other answers just works for English words as \w only works for A-Za-z chars. I modified the @Gurmanjot Singh code to work for all words from all languages and it worked like a charm. You may find it helpful for non-english usecases:

^(?:[\p{L}]+[^\p{L}]+){20}[\p{L}]+

Upvotes: 0

Ryszard Czech
Ryszard Czech

Reputation: 18611

Use

^\w+(?:-\w+)*(?:\s+\w+(?:-\w+)*){2}

See proof

Regex short circuit:

enter image description here

Explanation

--------------------------------------------------------------------------------
  ^                        the beginning of the string
--------------------------------------------------------------------------------
  \w+                      word characters (a-z, A-Z, 0-9, _) (1 or
                           more times (matching the most amount
                           possible))
--------------------------------------------------------------------------------
  (?:                      group, but do not capture (0 or more times
                           (matching the most amount possible)):
--------------------------------------------------------------------------------
    -                        '-'
--------------------------------------------------------------------------------
    \w+                      word characters (a-z, A-Z, 0-9, _) (1 or
                             more times (matching the most amount
                             possible))
--------------------------------------------------------------------------------
  )*                       end of grouping
--------------------------------------------------------------------------------
  (?:                      group, but do not capture (2 times):
--------------------------------------------------------------------------------
    \s+                      whitespace (\n, \r, \t, \f, and " ") (1
                             or more times (matching the most amount
                             possible))
--------------------------------------------------------------------------------
    \w+                      word characters (a-z, A-Z, 0-9, _) (1 or
                             more times (matching the most amount
                             possible))
--------------------------------------------------------------------------------
    (?:                      group, but do not capture (0 or more
                             times (matching the most amount
                             possible)):
--------------------------------------------------------------------------------
      -                        '-'
--------------------------------------------------------------------------------
      \w+                      word characters (a-z, A-Z, 0-9, _) (1
                               or more times (matching the most
                               amount possible))
--------------------------------------------------------------------------------
    )*                       end of grouping
--------------------------------------------------------------------------------
  ){2}                     end of grouping

Upvotes: 2

Gurmanjot Singh
Gurmanjot Singh

Reputation: 10360

Try this regex:

^(?:[\w-]+[^\w-]+){2}[\w-]+

Click for Demo

Explanation:

  • ^ - matches the start of the line
  • (?:[\w-]+[^\w-]+){2}
    • [\w-]+ - matches 1+ occurrences of a word character or -
    • [^\w-]* - matches 1+ occurrences of all the characters which are either non-word characters or not a - i.e, matches every character other than alphabets, numbers, underscore and -
    • {2} - repeat the above 2 steps 2 times
  • [\w-]+ - matches 1+ occurrences of a word character or -

Upvotes: 2

Related Questions