Reputation: 1999
I have a regex that looks like this (all word characters plus underscore and dash):
/[\w\-_]+/gm
And an input that looks like this:
This is a cat. It is fat. That is a dog. It looks like a log. Fat-cat dog_log
It is correctly matching all the words, skipping the whitespace and punctuations. But I only want to get the first 3 words. I thought I could just add {1,3}
to the end of the regex and get this result but this gives an error. The regex tester I used can be found here: https://regex101.com/r/Ec1IAH/1
Upvotes: 0
Views: 1453
Reputation: 1
Try this regex to capture multiple group of n words in a text
([\w\W]+?\s){3}
Upvotes: 0
Reputation: 11
The regex in other answers just works for English words as \w only works for A-Za-z chars. I modified the @Gurmanjot Singh code to work for all words from all languages and it worked like a charm. You may find it helpful for non-english usecases:
^(?:[\p{L}]+[^\p{L}]+){20}[\p{L}]+
Upvotes: 0
Reputation: 18611
Use
^\w+(?:-\w+)*(?:\s+\w+(?:-\w+)*){2}
See proof
Regex short circuit:
Explanation
--------------------------------------------------------------------------------
^ the beginning of the string
--------------------------------------------------------------------------------
\w+ word characters (a-z, A-Z, 0-9, _) (1 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
(?: group, but do not capture (0 or more times
(matching the most amount possible)):
--------------------------------------------------------------------------------
- '-'
--------------------------------------------------------------------------------
\w+ word characters (a-z, A-Z, 0-9, _) (1 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
)* end of grouping
--------------------------------------------------------------------------------
(?: group, but do not capture (2 times):
--------------------------------------------------------------------------------
\s+ whitespace (\n, \r, \t, \f, and " ") (1
or more times (matching the most amount
possible))
--------------------------------------------------------------------------------
\w+ word characters (a-z, A-Z, 0-9, _) (1 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
(?: group, but do not capture (0 or more
times (matching the most amount
possible)):
--------------------------------------------------------------------------------
- '-'
--------------------------------------------------------------------------------
\w+ word characters (a-z, A-Z, 0-9, _) (1
or more times (matching the most
amount possible))
--------------------------------------------------------------------------------
)* end of grouping
--------------------------------------------------------------------------------
){2} end of grouping
Upvotes: 2
Reputation: 10360
Try this regex:
^(?:[\w-]+[^\w-]+){2}[\w-]+
Explanation:
^
- matches the start of the line(?:[\w-]+[^\w-]+){2}
[\w-]+
- matches 1+ occurrences of a word character or -
[^\w-]*
- matches 1+ occurrences of all the characters which are either non-word characters or not a -
i.e, matches every character other than alphabets, numbers, underscore and -
{2}
- repeat the above 2 steps 2 times[\w-]+
- matches 1+ occurrences of a word character or -
Upvotes: 2