Sahil Saxena
Sahil Saxena

Reputation: 141

regular expression to match optional characters

I am struggling to find the regular expression which matches below 2 formats:

cmd1 = "cmd:ifconfig:"PASS":"Fail":4:2"

cmd2 = "cmd:ifconfig:"PASS""

Below is my sample python code

import re
cmd_reg = r'cmd:(.*):\"(.*?)\"$'
result=re.findall(cmd_reg,cmd2)
print(result)      # output -> [('ifconfig', 'PASS')] Expectation [('ifconfig', 'PASS', 'FAIL', 4, 2)]
result=re.findall(cmd_reg,cmd1)
print(result)      # output -> [] Expectation :  [('ifconfig', 'PASS', '','','')]

But I couldn't figure out the regular expression which gives the output as mentioned in Expectation

Upvotes: 1

Views: 76

Answers (4)

Ken T
Ken T

Reputation: 2553

I would suggest the following pattern:

:(\w*):"?(\w*)"?:?"?(\w*)"?:?"?(\w*)"?:?"?(\w*)"?

You can try the above pattern interactively at the following website:

https://regex101.com/r/M9bf6m/2

Upvotes: 0

Konrad Rudolph
Konrad Rudolph

Reputation: 545528

Python’ regex package can’t match multiple occurrences of a given group, so this will fundamentally not work with a single regular expression (some other regex implementations do support this, by distinguishing between a match and a capture).

I believe your best bet is to

  1. match the overall expression and capture the command and the remainder, and
  2. iterate over the groups in the remainder using a second regex.
cmd_pattern = r'^cmd:([^:]+):(.*)$'
group_pattern = r'"?([^:"]+)"?' # or, simpler, r'[^:]+'; to retain quotes.

cmd, groups = re.match(cmd_pattern, cmd1).groups()
parsed_groups = re.findall(group_pattern, groups)

For cmd2, parsed_groups will be ['PASS'], which I think makes more general sense than your desired result. If you need to fill the list with empty elements, you need to do this manually.


As an alternative, you could hard-code the four groups, and make them optional:

cmd_pattern = r'^cmd:([^:]+):([^:]+)(?::([^:]+))?(?::([^:]+))?(?::([^:]+))?'
re.match(cmd_pattern, cmd1).groups()
# ('ifconfig', '"PASS"', '"Fail"', '4', '2')

re.match(cmd_pattern, cmd2).groups()
# ('ifconfig', '"PASS"', None, None, None)

… I don’t recommend this. And this complex expression doesn’t even handle optional quotes yet, which would make it even more complex.

Upvotes: 1

Shadow_Ninja
Shadow_Ninja

Reputation: 3

cmd1 = 'cmd:ifconfig:"PASS":"":4:'

cmd2 = 'cmd:ifconfig:"PASS"'

import re
cmd_reg = r'cmd:(.*):\"(.*)(:\"\":(\d):)?$'
results =re.findall(cmd_reg,str([cmd1,cmd2])
print(results)

Upvotes: 0

Yossi Levi
Yossi Levi

Reputation: 1268

in general there are a lot of ways to implement this. if you will give other examples the regex can be more fit to the general case (and not over-fit to this example). I searched exactly like you, and tried to search for any digit that between 2 delimiters of : which comes after 2 times of ", which all of this extra string is optional) try this:

cmd1 = 'cmd:ifconfig:"PASS":"":4:'

cmd2 = 'cmd:ifconfig:"PASS"'

import re
cmd_reg = r'cmd:(.*):\"(.*)(:\"\":(\d):)?$'
result=re.findall(cmd_reg,cmd2)
print(result)
#output -> [('ifconfig', 'PASS')]
result=re.findall(cmd_reg,cmd1)
print(result)
#output -> []

output:

[('ifconfig', 'PASS"', '', '')]
[('ifconfig:"PASS"', '":4:', '', '')]

Upvotes: 0

Related Questions