Reputation: 143
I have this in my file
import re
sample = """Name: @s
Owner: @a[tag=Admin]"""
target = r"@[sae](\[[\w{}=, ]*\])?"
regex = re.split(target, sample)
print(regex)
I want to split all words that start with @
, so like this:
["Name: ", "@s", "\nOwner: ", "@a[tag=Admin]"]
But instead it give this:
['Name: ', None, '\nOwner: ', '[tag=Admin]', '']
How to seperating it?
Upvotes: 3
Views: 549
Reputation: 163372
In your output, you keep the [tag=Admin]
as that part is in a capture group, and using split can also return empty strings.
Another option is to be specific about the allowed data format, and instead of split capture the parts in 2 groups.
(\s*\w+:\s*)(@[sae](?:\[[\w{}=, ]*])?)
The pattern matches:
(
Capture group 1
\s*\w+:\s*
Match 1+ word characters and :
between optional whitespace chars)
Close group(
Capture group 2
@[sae]
Match @
followed by either s
a
e
(?:\[[\w{}=, ]*])?
Optionally match [...]
)
Close groupExample code:
import re
sample = """Name: @s
Owner: @a[tag=Admin]"""
target = r"(\s*\w+:\s*)(@[sae](?:\[[\w{}=, ]*])?)"
listOfTuples = re.findall(target, sample)
lst = [s for tpl in listOfTuples for s in tpl]
print(lst)
Output
['Name: ', '@s', '\nOwner: ', '@a[tag=Admin]']
See a regex demo and a Python demo.
Upvotes: 0
Reputation: 241821
re.split
expects the regular expression to match the delimiters in the string. It only returns the parts of the delimiters which are captured. In the case of your regex, that's only the part between the brackets, if present.
If you want the whole delimiter to show up in the list, put parentheses around the whole regex:
target = r"(@[sae](\[[\w{}=, ]*\])?)"
But you are probably better off not capturing the interior group. You can change it to a non-capturing group by using (?:…)
instead of (…)
:
target = r"(@[sae](?:\[[\w{}=, ]*\])?)"
Upvotes: 0
Reputation: 110685
If I understand the requirements correctly you could do that as follows:
import re
s = """Name: @s
Owner: @a[tag=Admin]
"""
rgx = r'(?=@.*)|(?=\r?\n[^@\r\n]*)'
re.split(rgx, s)
#=> ['Name: ', '@s', '\nOwner: ', '@a[tag=Admin]\n']
The regular expression can be broken down as follows.
(?= # begin a positive lookahead
@.* # match '@' followed by >= 0 chars other than line terminators
) # end positive lookahead
| # or
(?= # begin a positive lookahead
\r?\n # match a line terminator
[^@\r\n]* # match >= 0 characters other than '@' and line terminators
) # end positive lookahead
Notice that matches are zero-width.
Upvotes: 3
Reputation: 521639
I would use re.findall
here:
sample = """Name: @s
Owner: @a[tag=Admin]"""
parts = re.findall(r'@\w+(?:\[.*?\])?|\s*\S+\s*', sample)
print(parts) # ['Name: ', '@s', '\nOwner: ', '@a[tag=Admin]']
The regex pattern used here says to match:
@\w+ a tag @some_tag
(?:\[.*?\])? followed by an optional [...] term
| OR
\s*\S+\s* any other non whitespace term,
including optional whitespace on both sides
Upvotes: 3