Python parse comment by non string characters

Question

I am trying to split/parse comments which have strings, numbers and emojis and I want to do a generic code that can parse strings in different parts depending on the existence of an emoji in the comment.

For example:

comment_1 = "This is :) my comment :O"
comment_2 = ">:O Another comment to :v parse"

The output should be something like:

output_1 = ["This is", "my comment"]
output_2 = ["Another comment to", "parse"]

I have been thinking that I could do a parsing with special characters only, but maybe it will leave the "O" in ":O", or the "v" in ":v"

Tim Biegeleisen · Accepted Answer

You may try matching on the pattern (?, which attempts to find any sequence of all word terms, which may end in an optional non whitespace character (such as a punctuation character).


inp = ["This is :) my comment :O", ">:O Another comment to :v parse"]
for i in inp:
    matches = re.findall(r'(?

This prints:
['This is', 'my comment']
['Another comment to', 'parse']

Here is an explanation of the regex pattern being used:
(?

Python parse comment by non string characters

Answers (2)

Related Questions