V Anon
V Anon

Reputation: 543

regex - matching non-necessarily consecutive occurrences

I have been trying to match the occurrences of 0s between the range 3 to 5

So my goal is to match all strings that contain 3 to 5 0s.

So far I have,

egrep '[0]{3,5}' *.txt

expected output

20001 [valid]

200134 [invalid]

20103040 [valid]

203004038002 [invalid]

but this would output strings that only have the consecutive zeroes.

How can I modify the code so that it would also match for non-necessarily consecutive zeroes?

Upvotes: 0

Views: 404

Answers (4)

Ed Morton
Ed Morton

Reputation: 203324

An ERE to match integers containing 3-5 0s, if that's what you want, is ^([1-9]*0){3,5}[1-9]*$, e.g.:

$ grep -E '^([1-9]*0){3,5}[1-9]*$' file
20001
20103040

The difference between this and @Toto's answer is that this will just match integers while @Totos will match any characters with 0s in between, e.g.:

$ echo '0 foo 0 bar 0' | grep -E '^([1-9]*0){3,5}[1-9]*$'
$ echo '0 foo 0 bar 0' | grep -E '^([^0]*0){3,5}[^0]*$'
0 foo 0 bar 0

Upvotes: 0

superkytoz
superkytoz

Reputation: 1279

The regex you're looking for is:

^(?!(?:.*?0){6,})(?=(?:.*?0){3,})[0-9]+$

Input file:

cat file.txt
20001
200134
20103040
203004038002

Command:

To use the regex I use grep -P, because the lookaround notation (?! is not supported in egrep

grep -P '^(?!(?:.*?0){6,})(?=(?:.*?0){3,})[0-9]+$' file.txt
20001
20103040

Explanation: First I use a negative lookahead to make sure you can't type more than six characters of 0 anywhere in the string. After that I use a positive lookahead to make sure that the string must contain at least 3 characters of 0.

The ^ is the start of the string. And the $ is the end of the string.

Upvotes: 0

Toto
Toto

Reputation: 91385

Input file:

cat file.txt
10203
1020304
102030405
10203040506
1020304050607

Command:

egrep '^([^0]*0){3,5}[^0]*$' file.txt
1020304
102030405
10203040506

Explanation:

^                   # beginning of line
    (               # start group
        [^0]*       # 0 or more non zero
        0           # 1 zero
    ){3,5}          # group must appear from 3 to 5 times
    [^0]*           # 0 or more non zero
$                   # end of line

Upvotes: 0

Jamie - Decodefy Ltd
Jamie - Decodefy Ltd

Reputation: 1397

I came up with this solution which would allow you to check for 3-5 0s possibly surrounded by anything that isn't a 0 or a space. Hope this helps :)

\b(?:[^0\s]*?0[^0\s]*?){3,5}\b

If you're checking ONLY strings of numbers with no breaks in between or other characters, you could swap the \bs for ^ and $ and remove the \s and make sure it's only numbers:

^(?:[1-9]*?0[1-9]*?){3,5}$

^ matches the start of the string, and $ matches the end of the string.

Upvotes: 1

Related Questions