PritamYaduvanshi
PritamYaduvanshi

Reputation: 73

Split string based on regex pattern

I have a message which I am trying to split.

import re

message = "Aug 10, 17:04 UTCThis is update 1.Aug 10, 15:56 UTCThis is update 2.Aug 10, 15:55 UTCThis is update 3."

split_message = re.split(r'[a-zA-Z]{3} (0[1-9]|[1-2][0-9]|3[0-1]), ([0-1]?[0-9]|2[0-3]):[0-5][0-9] UTC', message)

print(split_message)

Expected Output:

["This is update 1", "This is update 2", "This is update 3"]

Actual Output:

['', '10', '17', "This is update 1", '10', '15',  "This is update 2", '10', '15', "This is update 3"]

Not sure what I am missing.

Upvotes: 3

Views: 80

Answers (2)

Aram Becker
Aram Becker

Reputation: 2176

You are using "capturing groups", this is why their content is also part of the result array. You'll want to use non capturing groups (beginning with ?:):

import re

message = "Aug 10, 17:04 UTCThis is update 1.Aug 10, 15:56 UTCThis is update 2.Aug 10, 15:55 UTCThis is update 3."

split_message = re.split(r"[a-zA-Z]{3} (?:0[1-9]|[1-2][0-9]|3[0-1]), (?:[0-1]?[0-9]|2[0-3]):[0-5][0-9] UTC", message)

print(split_message)

You will however always get an empty entry first, because an empty string is in front of your first split pattern:

['', 'This is update 1.', 'This is update 2.', 'This is update 3.']

As statet in the docs:

If capturing parentheses are used in pattern, then the text of all groups in the pattern are also returned as part of the resulting list.

Upvotes: 4

Jeremy Savage
Jeremy Savage

Reputation: 894

Not using regex, but wanted to highlight the power of Python string splitting for tasks like this. Way less headaches as easier to understand.

message = "Aug 10, 17:04 UTCThis is update 1.Aug 10, 15:56 UTCThis is update 2.Aug 10, 15:55 UTCThis is update 3."
values = message.split("UTC")
values = values[1:]
result = [v.split(".")[0] for v in values]

Note: this may not work if your messages ("This is update 1.") contain multiple . symbols.

Upvotes: 0

Related Questions