Reputation: 1089
I am trying to parse a comma separated string keyword://pass@ip:port. The string is a comma separated string, however the password can contain any character including comma. hence I can not use a split operation based on comma as delimiter.
I have tried to use regex to get the string after "myserver://" and later on I can split the rest of the information by using string operation (pass@ip:port/key1) but I could not make it working as I can not fetch the information after the above keyword.
myserver:// is a hardcoded string, and I need to get whatever follows each myserver as a comma separated list (i.e. pass@ip:port/key1, pass2@ip2:port2/key2, etc)
This is the closest I can get:
import re
my_servers="myserver://password,123@ip:port/key1,myserver://pass2@ip2:port2/key2"
result = re.search(r'myserver:\/\/(.*)[,(.*)|\s]', my_servers)
using search I tries to find the occurrence of the "myserver://" keyword followed by any characters, and ends with comma (means it will be followed by myserver://zzz,myserver://qqq) or space (incase of single myserver:// element, but I do not know how to do this better apart of using space as end-indicator). However this does not come out right. How can I do this better with regex?
Upvotes: 1
Views: 4547
Reputation: 627469
You may consider the following splitting approach if you do not need to keep myserver://
in the results:
filter(None, re.split(r'\s*,?\s*myserver://', s))
The \s*,?\s*myserver://
pattern matches an optional ,
enclosed with 0+ whitespaces and then myserver://
substring. See this regex demo. Note we need to remove empty entries to get rid of an empty leading entry as when the match is found at the string start, the empty string at the beginning will be added to the resulting list.
Alternatively, you can use the lookahead based pattern with a lazy dot matching pattern with re.findall
:
rx = r"myserver://(.*?)(?=\s*,\s*myserver://|$)"
See the Python demo
Details:
myserver://
- a literal substring(.*?)
- Capturing group 1 whose contents will be returned by re.findall
matching any 0+ chars other than line break chars, as few as possible, up to the first occurrence (but excluding it)(?=\s*,\s*myserver://|$)
- either of the 2 alternatives:
\s*,\s*myserver://
- ,
enclosed with 0+ whitespaces and then a literal myserver://
substring|
- or$
- end of string.Here is the regex demo.
See a Python demo for the both approaches:
import re
s = "myserver://password,123@ip:port/key1,myserver://pass2@ip2:port2/key2"
rx1 = r'\s*,?\s*myserver://'
res1 = filter(None, re.split(rx1, s))
print(res1)
#or
rx2 = r"myserver://(.*?)(?=\s*,\s*myserver://|$)"
res2 = re.findall(rx2, s)
print(res2)
Both will print ['password,123@ip:port/key1', 'pass2@ip2:port2/key2']
.
Upvotes: 3