Reputation: 820
I have the following text from which I need to extract certain phrases:
Restricted Cash 951 37505 Accounts Receivable - Affiliate 31613 27539 Accounts
Receivable - Third Party 23091 2641 Crude Oil Inventory 2200 0 Other Current
Assets 2724 389
Total Current Assets 71319 86100 Property Plant and Equipment Total Property
Plant and Equipment Gross 1500609 706039 Less Accumulated
Depreciation and Amortization (79357) (44271) Total Property Plant and Equipment
Net 1421252 661768 Intangible Assets Net 310202 0 Goodwill 109734 0 Investments
82317 80461 Other Noncurrent Assets 3093 1429 Total Assets 1997917 829758
LIABILITIES Current Liabilities Accounts Payable - Affiliate 2778 1616 Accounts
Payable - Trade 92756 109893 Other Current Liabilities 9217 2876 Total Current
Liabilities 104751 114385 Long-Term Liabilities Long-Term Debt 559021 85000
Asset Retirement Obligations 17330 10416 Other Long-Term Liabilities 582 3727
Total Liabilities 681684 213528 EQUITY Partners' Equity Limited Partner
Common Units (23759 and 23712 units outstanding respectively) 699866 642616
Subordinated Units (15903 units outstanding) (130207) (168136) General Partner 2421 520
Total Partners' Equity 572080 475000 Noncontrolling Interests 744153 141230 Total
Equity 1316233 616230 Total Liabilities and Equity 1997917 829758
I need to remove all phrases that would be in parenthesis, i.e. (), and also would contain number with word outstanding or units.
Based on these conditions, I have two phrases that needs to be removed:
I have tried the following Regex in Python:
\(\d+.+?(outstanding)+?\)
The idea was that .+?
after \d+
will make Regex non-greedy (lazy). However, regex selects huge segment starting from (79357) (44271) Total Property Plant and Equipment
till outstanding)
which is greedy.
The unique marker here is word outstanding
, may be there is better approach to extracting those phrases?
Upvotes: 1
Views: 28
Reputation: 626870
You may use
\(\d[^()]*outstanding[^()]*\)
See the regex demo and the regex graph:
Details
\(
- (
char\d
- a digit[^()]*
- 0+ chars other than (
and )
outstanding
- a substring[^()]*
- 0+ chars other than (
and )
\)
- a )
char.Python:
re.findall(r'\(\d[^()]*outstanding[^()]*\)', s)
Upvotes: 1