Reputation: 41
Given a string which follows the structure as-
" (subsidiary of <holding_company>) <post_>"
where
Example string: " google (subsidiary of alphabet (inc.)) xyz"
How to extract the holding company name using regex?
Upvotes: 1
Views: 129
Reputation: 3593
The regular expression to extract that is as follows:
"subsidiary of\s+(.*)\)\s+\S+"
In Python2 code, you'd do something like:
import re
regex = r"subsidiary of\s+(.*)\)\s+\S+"
test_str = "\" (subsidiary of <holding_company>) <post_>\""
m = re.search(regex, test_str)
if m:
# if it found the pattern, the company name is in group(1)
print m.group(1)
See it in action here: https://repl.it/repls/ShyFocusedInstructions#main.py
Upvotes: 2
Reputation: 1269
This creates capture groups for your holding company and post. You may need to expand the regex to include additional special characters. Here's the regex on regex101 if you need to expand it https://regex101.com/r/xpVfqU/1
#!/usr/bin/python3
import re
str=" (subsidiary of <holding_company>) <post_>"
holding_company=re.sub(r'\s\(subsidiary\ of\ ([\w<>]*)\)\s*(.*)', '\\1', str)
post=re.sub(r'\s\(subsidiary\ of\ ([\w<>]*)\)\s*(.*)', '\\2', str)
print(holding_company)
print(post)
Upvotes: 1