Reputation: 73
I have a message which I am trying to split.
import re
message = "Aug 10, 17:04 UTCThis is update 1.Aug 10, 15:56 UTCThis is update 2.Aug 10, 15:55 UTCThis is update 3."
split_message = re.split(r'[a-zA-Z]{3} (0[1-9]|[1-2][0-9]|3[0-1]), ([0-1]?[0-9]|2[0-3]):[0-5][0-9] UTC', message)
print(split_message)
Expected Output:
["This is update 1", "This is update 2", "This is update 3"]
Actual Output:
['', '10', '17', "This is update 1", '10', '15', "This is update 2", '10', '15', "This is update 3"]
Not sure what I am missing.
Upvotes: 3
Views: 80
Reputation: 2176
You are using "capturing groups", this is why their content is also part of the result array. You'll want to use non capturing groups (beginning with ?:
):
import re
message = "Aug 10, 17:04 UTCThis is update 1.Aug 10, 15:56 UTCThis is update 2.Aug 10, 15:55 UTCThis is update 3."
split_message = re.split(r"[a-zA-Z]{3} (?:0[1-9]|[1-2][0-9]|3[0-1]), (?:[0-1]?[0-9]|2[0-3]):[0-5][0-9] UTC", message)
print(split_message)
You will however always get an empty entry first, because an empty string is in front of your first split pattern:
['', 'This is update 1.', 'This is update 2.', 'This is update 3.']
As statet in the docs:
If capturing parentheses are used in pattern, then the text of all groups in the pattern are also returned as part of the resulting list.
Upvotes: 4
Reputation: 894
Not using regex, but wanted to highlight the power of Python string splitting for tasks like this. Way less headaches as easier to understand.
message = "Aug 10, 17:04 UTCThis is update 1.Aug 10, 15:56 UTCThis is update 2.Aug 10, 15:55 UTCThis is update 3."
values = message.split("UTC")
values = values[1:]
result = [v.split(".")[0] for v in values]
Note: this may not work if your messages ("This is update 1.") contain multiple . symbols.
Upvotes: 0