Reputation: 14935
I have a text similar to the text below. It contains a 4
digits number that follows either digit-
or whitespace
and is followed by either .
, ?
, -digit
or whitespace
.
I need to match all of the digits in the first paragraph but none in the second since those digits do not meet my conditions.
Lorem ipsum 3400-digit, sit amet 5000 consectetur adipisicing elit. Natus, explicabo 6700? Itaque iure ipsum laboriosam, ex nemo delectus iste quia cupiditate digit-9134? Iste nam digit-2456 at voluptate est 8456-digit? At excepturi quis voluptatibus 7500.
Lorem ipsum $5000 dolor sit amet consectetur adipisicing elit. Obcaecati tempora dolorum repellat reiciendis cum soluta deserunt ex voluptatibus, nam illum veniam £5550 quidem aperiam sequi, nostrum sed? Quidem eveniet maiores #5550 autem. https://codepen.io/pen/5000/3454
There are a few similar questions already on StackOverflow. I have gone through some of them(links below), but I still can not do this. Please before marking this question as duplicate, check if your solution finds all the occurrence of the 4 digits number in the first paragraph but none in the second paragraph.
Upvotes: 6
Views: 15147
Reputation: 3474
(\s|digit-)([0-9]{4})(?=-digit|\.|\?|\s)
You need an OR
statement at the beginning and end of your query, with four digits in the middle.
To explain further:
(?!\s|digit-)
- negative lookahead: either whitespace or digit-
[0-9]{4}
- a number from 0 to 9, exactly four times(?=-digit|\.|\?|\s)
- positive lookahead: either -digit
, a .
(escaped because .
is a special character in Regex), a question mark (also escaped for the same reason), or whitespace.Upvotes: 1
Reputation: 626748
You may use the following pattern:
/(?:\bdigit-|\s|^)(\d{4})(?=[.?\s]|-digit\b|$)/gi
See the regex demo. You need to get all Group 1 values.
Details
(?:\bdigit-|\s|^)
- either digit-
(as a whole word), whitespace or start of string(\d{4})
- Group 1: four digits(?=[.?\s]|-digit\b|$)
- immediately to the right, there must be a .
, whitespace, ?
, -digit
(as a whole word) or end of string. NOTE Without a lookahead, consecutive whitespace-separated matches will be left out.JS demo:
var strs = ["Lorem ipsum 3400-digit, sit amet 5000 consectetur adipisicing elit. Natus, explicabo 6700? Itaque iure ipsum laboriosam, ex nemo delectus iste quia cupiditate digit-9134? Iste nam digit-2456 at voluptate est 8456-digit? At excepturi quis voluptatibus 7500.", "Lorem ipsum $5000 dolor sit amet consectetur adipisicing elit. Obcaecati tempora dolorum repellat reiciendis cum soluta deserunt ex voluptatibus, nam illum veniam £5550 quidem aperiam sequi, nostrum sed? Quidem eveniet maiores #5550 autem. https://codepen.io/pen/5000/3454" ];
var rx = /(?:\bdigit-|\s|^)(\d{4})(?=[.?\s]|-digit\b|$)/gi;
for (var s of strs) {
var m, res =[];
while(m=rx.exec(s)) {
res.push(m[1]);
}
console.log(res);
}
Upvotes: 9