schwillr
schwillr

Reputation: 41

Using regex to extract holding company

Given a string which follows the structure as-

" (subsidiary of <holding_company>) <post_>"

where

Example string: " google (subsidiary of alphabet (inc.)) xyz"

How to extract the holding company name using regex?

Upvotes: 1

Views: 129

Answers (3)

sal
sal

Reputation: 3593

The regular expression to extract that is as follows:

"subsidiary of\s+(.*)\)\s+\S+"

In Python2 code, you'd do something like:

import re
regex = r"subsidiary of\s+(.*)\)\s+\S+"
test_str = "\" (subsidiary of <holding_company>) <post_>\""

m = re.search(regex, test_str)

if m:
  # if it found the pattern, the company name is in group(1)
  print m.group(1)

See it in action here: https://repl.it/repls/ShyFocusedInstructions#main.py

Upvotes: 2

Nick
Nick

Reputation: 1269

This creates capture groups for your holding company and post. You may need to expand the regex to include additional special characters. Here's the regex on regex101 if you need to expand it https://regex101.com/r/xpVfqU/1

#!/usr/bin/python3

import re

str=" (subsidiary of <holding_company>) <post_>"

holding_company=re.sub(r'\s\(subsidiary\ of\ ([\w<>]*)\)\s*(.*)', '\\1', str)
post=re.sub(r'\s\(subsidiary\ of\ ([\w<>]*)\)\s*(.*)', '\\2', str)

print(holding_company)
print(post)

Upvotes: 1

Bendik Knapstad
Bendik Knapstad

Reputation: 1458

This suld get you there :

(?<=\(subsidiary of)(.*)(?=\) )

Upvotes: 1

Related Questions