Ira
Ira

Reputation: 567

Regex match a pattern occurring multiple times in a string

Using grepin Ubuntu, I am trying to regex match a pattern that is repeated multiple times in a line.

Example: 0:0, 80:3, 443:0, 8883:0, 9000:0, 9001:0,

The regex I tried is -

([0-9]+:[0-9]+, )+

But it only matches upto -

0:0, 80:3, 443:0, 8883:0, 9000:0,

I would want it to match the complete line. Also, I'd appreciate if the regex will check if there is a presence of 80 and 443 in the matched string.

Expectation -

The following lines should be matched -

0:0, 80:3, 443:0, 8883:0, 9000:0, 9001:0,
0:0, 80:0, 443:1, 8883:0, 9000:0, 9001:0,
0:0, 80:0, 443:0, 8883:0, 9000:0, 9001:0,
0:0, 80:3, 443:1, 8883:0, 9000:0, 9001:0,

and the ones below should not be matched -

0:0, 443:0, 8883:0, 9000:0, 9001:0,
0:0, 80:0, 8883:0, 9000:0, 9001:0,
0:0, 8883:0, 9000:0, 9001:0,

Upvotes: 1

Views: 2437

Answers (2)

RavinderSingh13
RavinderSingh13

Reputation: 133458

Here is more robust awk pattern match, which is as per your shown samples, written and tested in GNU awk, should work in any awk. Simple explanation of awk code would be: awk works on method of condition/regexp then action, so I am mentioning condition/regexp here with NO action so if regexp is TRUE(matched) then by default printing of line will happen.

awk '/^0:[0-9],[[:space:]]+80:[0-9],[[:space:]]+443:[0-9],[[:space:]]+8883:[0-9](,[[:space:]]+9[0-9]{3}:[0-9]){2},$/' Input_file

Explanation: Adding detailed explanation for above regex.

^0:[0-9],[[:space:]]+             ##From starting of line matching 0 followed by colon followed by comma, followed y 0 OR 1 occurrences of space(s).
80:[0-9],[[:space:]]+             ##Above regex is followed by 80 colon any digit comma and space(s).
443:[0-9],[[:space:]]+            ##Above is followed by 443 colon digit comma and space(s).
8883:[0-9]                        ##Above is followed by 8883 followed by colon followed by any digit.
(,[[:space:]]+9[0-9]{3}:[0-9]){2} ##matching comma space(s) followed by 9 which is followed by 3 digits and this whole match 2 times(to match last 2 9000 etc values).
,$                                ##Matching comma at the end of line here.

Upvotes: 2

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626738

You can use

^[0-9]+:[0-9]+, 80:[0-9]+, 443:[0-9]+(, [0-9]+:[0-9]+)+,$

See the regex demo.

Also, consider the awk solution like

awk '/^[0-9]+:[0-9]+(, [0-9]+:[0-9]+)+,$/ && /80/ && /443/' file

See the online demo:

#!/bin/bash
s='0:0, 80:3, 443:0, 8883:0, 9000:0, 9001:0,
0:0, 80:0, 443:1, 8883:0, 9000:0, 9001:0,
0:0, 80:0, 443:0, 8883:0, 9000:0, 9001:0,
0:0, 80:3, 443:1, 8883:0, 9000:0, 9001:0,
0:0, 443:0, 8883:0, 9000:0, 9001:0,
0:0, 80:0, 8883:0, 9000:0, 9001:0,
0:0, 8883:0, 9000:0, 9001:0,'
awk '/^[0-9]+:[0-9]+(, [0-9]+:[0-9]+)+,$/ && /80/ && /443/' <<< "$s"

Output:

0:0, 80:3, 443:0, 8883:0, 9000:0, 9001:0,
0:0, 80:0, 443:1, 8883:0, 9000:0, 9001:0,
0:0, 80:0, 443:0, 8883:0, 9000:0, 9001:0,
0:0, 80:3, 443:1, 8883:0, 9000:0, 9001:0,

Upvotes: 2

Related Questions