Reputation: 2903
I would like to parse a string with or without brackets. Basically for john[doe]
, I would like to get two variables, basically outside the []
and inside the bracket. So for this example I would like to extract john
and doe
. The string will always have this structure. But another example can also be just john
, means second variable is ""
or None
. How can I do this using the re library? Or just straight Python, if it's more efficient that regex?
This is what I tried so far:
s = sample_string.split("[")
x, y = (sample_string, None) if len(s) == 1 else (s[0], s[1][:-1])
Upvotes: 2
Views: 1588
Reputation: 1
As long as john[doe]
is a string type, you should be able to parse the phrase using the replace
function:
import re
x = str('john[doe]')
new_x = x.replace("[", " ").replace("]", "")
print(new_x)
or, if you want to, you can use the match
function:
import re
x = str('john[doe]')
m = re.match(r"(?P<first_name>\w+)\[(?P<last_name>\w+)\]", x)
name = m.group('first_name') + " " + m.group('last_name')
print(name)
Without having more phrases to parse, I am not sure which of the two is faster. Good luck! :)
Upvotes: 0
Reputation: 44108
A regex solution:
r'^([^[]+)(?:\[([^\]]+)])?$'
^
Matches start of string.([^[]+)
Capture group 1: matches 1 or more characters that are not '['.(?:
Start of non-capturing group.\[
Matches '['.([^\]]+)
Capture group 2: matches 1 or more characters that are not ']'.]
Matches ']')
End of non-capturing group.import re
tests = ['john', 'john[doe]']
for test in tests:
m = re.match(r'^([^[]+)(?:\[([^\]]+)])?$', test)
if m:
print(test, '->', m[1], m[2])
Prints:
john -> john None
john[doe] -> john doe
Explanations
First, anything between parentheses ( )
is a capturing group. Anything between (?: )
is a non-capturing group. Either of these types of groups can contain capturing an non-capturing groups within. []
is used to define a set of characters. For example, [aqw]
matches 'a', 'q' or 'w'. [a-e]
matches 'a', 'b', 'c', 'd' or 'e'. [^aqw]
with a leading ^
negates the set meaning it matches any character other than 'a', 'q', 'w'. So, [^\]]
matches any character other than ']' (you have to put a \
character in front of the ]
character to "escape" it because in that context ]
has special meaning (it would otherwise close the []
construct). The following +
sign denotes "one or more of what preceded this". So ([^[]+)
matches one or more of nay character that is not a [
.
I hope the preceding explanations help.
Upvotes: 4
Reputation: 33
There are probably better ways to do it but this worked for me.
s = "john[doe]"
arr = []
x = re.split("\[", s)[1]
arr.append(re.split("\[", s)[0])
arr.append(re.split("\]", x)[0])
print(arr)
Upvotes: -1
Reputation: 2292
Is it a requirement that you use regex for this? It's probably easier without:
if '[' in string:
x, y = string.split('[')
y = y.strip(']')
else:
x, y = string, ''
Something to include regex might look like this:
if '[' in string:
x, y = re.findall('^(.+)\[(.+)?]', string)[0]
else:
x, y = string, ''
Upvotes: 0