Reputation: 877
I have a string like below
"‘‘Apple’’ It is create by Steve Jobs (He was fired and get hired) ‘‘Microsoft’’ Bill Gates was the richest man in the world ‘‘Oracle’’ It is a database company"
I am able to manage to use regex to create groups of Apple, Microsoft and Oracle by using ‘‘(.*?)’’
but how should I extract the other sentence part to list?
What I want to create
companyList = ['Apple','Microsoft','Oracle']
descriptionList = ['It is create by Steve Jobs (He was fired and get hired)','Bill Gates was the richest man in the world','It is a database company']
Thank you in advance
Upvotes: 1
Views: 48
Reputation: 522817
One option is to use re.findall
with the following pattern:
‘‘(.*?)’’ (.*?)(?= ‘‘|$)
This will capture, in separate groups, the company name and description, for each match found in the input. Note that we use the lookahead (?= ‘‘|$)
as the end of the current description, which either occurs at the start of the next entry, or the end of the input.
inp = "‘‘Apple’’ It is create by Steve Jobs (He was fired and get hired) ‘‘Microsoft’’ Bill Gates was the richest man in the world ‘‘Oracle’’ It is a database company"
matches = re.findall('‘‘(.*?)’’ (.*?)(?= ‘‘|$)', inp)
companyList = [row[0] for row in matches]
descriptionList = [row[1] for row in matches]
print(companyList)
print(descriptionList)
This prints:
['Apple', 'Microsoft', 'Oracle']
['It is create by Steve Jobs (He was fired and get hired)',
'Bill Gates was the richest man in the world', 'It is a database company']
Upvotes: 3