Split based on Regex python

Question

I have a string like below

"‘‘Apple’’ It is create by Steve Jobs (He was fired and get hired) ‘‘Microsoft’’ Bill Gates was the richest man in the world ‘‘Oracle’’ It is a database company"

I am able to manage to use regex to create groups of Apple, Microsoft and Oracle by using ‘‘(.*?)’’ but how should I extract the other sentence part to list?

What I want to create

companyList = ['Apple','Microsoft','Oracle']

descriptionList = ['It is create by Steve Jobs (He was fired and get hired)','Bill Gates was the richest man in the world','It is a database company']

Thank you in advance

Tim Biegeleisen · Accepted Answer

One option is to use re.findall with the following pattern:

‘‘(.*?)’’ (.*?)(?= ‘‘|$)

This will capture, in separate groups, the company name and description, for each match found in the input. Note that we use the lookahead (?= ‘‘|$) as the end of the current description, which either occurs at the start of the next entry, or the end of the input.

inp = "‘‘Apple’’ It is create by Steve Jobs (He was fired and get hired) ‘‘Microsoft’’ Bill Gates was the richest man in the world ‘‘Oracle’’ It is a database company"
matches = re.findall('‘‘(.*?)’’ (.*?)(?= ‘‘|$)', inp)
companyList = [row[0] for row in matches]
descriptionList = [row[1] for row in matches]
print(companyList)
print(descriptionList)

This prints:

['Apple', 'Microsoft', 'Oracle']
['It is create by Steve Jobs (He was fired and get hired)',
 'Bill Gates was the richest man in the world', 'It is a database company']

Split based on Regex python

Answers (1)

Related Questions