Reputation: 2903

python regex parse string with brackets

I would like to parse a string with or without brackets. Basically for john[doe], I would like to get two variables, basically outside the [] and inside the bracket. So for this example I would like to extract john and doe. The string will always have this structure. But another example can also be just john, means second variable is "" or None. How can I do this using the re library? Or just straight Python, if it's more efficient that regex?

This is what I tried so far:

s = sample_string.split("[")
x, y = (sample_string, None) if len(s) == 1 else (s[0], s[1][:-1])

Upvotes: 2

Answers (4)

Shop_till_ I_drop

Reputation: 1

As long as john[doe] is a string type, you should be able to parse the phrase using the replace function:

import re

x = str('john[doe]')
new_x = x.replace("[", " ").replace("]", "")
print(new_x)

or, if you want to, you can use the match function:

import re

x = str('john[doe]')
m = re.match(r"(?P<first_name>\w+)\[(?P<last_name>\w+)\]", x)
name = m.group('first_name') + " " + m.group('last_name')
print(name)

Without having more phrases to parse, I am not sure which of the two is faster. Good luck! :)

Upvotes: 0

Booboo

Reputation: 44108

A regex solution:

r'^([^[]+)(?:\[([^\]]+)])?$'

^ Matches start of string.
([^[]+) Capture group 1: matches 1 or more characters that are not '['.
(?: Start of non-capturing group.
\[ Matches '['.
([^\]]+) Capture group 2: matches 1 or more characters that are not ']'.
] Matches ']'
) End of non-capturing group.
'?' The non-capturing group is optional.

import re

tests = ['john', 'john[doe]']

for test in tests:
    m = re.match(r'^([^[]+)(?:\[([^\]]+)])?$', test)
    if m:
        print(test, '->', m[1], m[2])

Prints:

john -> john None
john[doe] -> john doe

Explanations

First, anything between parentheses ( ) is a capturing group. Anything between (?: ) is a non-capturing group. Either of these types of groups can contain capturing an non-capturing groups within. [] is used to define a set of characters. For example, [aqw] matches 'a', 'q' or 'w'. [a-e] matches 'a', 'b', 'c', 'd' or 'e'. [^aqw] with a leading ^ negates the set meaning it matches any character other than 'a', 'q', 'w'. So, [^\]] matches any character other than ']' (you have to put a \ character in front of the ] character to "escape" it because in that context ] has special meaning (it would otherwise close the [] construct). The following + sign denotes "one or more of what preceded this". So ([^[]+) matches one or more of nay character that is not a [.

I hope the preceding explanations help.

Upvotes: 4

Jolly9642

Reputation: 33

There are probably better ways to do it but this worked for me.

s = "john[doe]"
arr = []
x = re.split("\[", s)[1]
arr.append(re.split("\[", s)[0])
arr.append(re.split("\]", x)[0])
print(arr)

Upvotes: -1

PApostol

Reputation: 2292

Is it a requirement that you use regex for this? It's probably easier without:

if '[' in string:
  x, y = string.split('[')
  y = y.strip(']')
else:
  x, y = string, ''

Something to include regex might look like this:

if '[' in string:
  x, y = re.findall('^(.+)\[(.+)?]', string)[0]
else:
  x, y = string, ''

Upvotes: 0

python regex parse string with brackets

Answers (4)

Related Questions