Python: how to do this complex multiline regex involving escapes?

Question

I have a file that looks like this:

...

- family:
  - home: house
    location: 53rd street|Austin|Texas|U.S
    type: old
original entry: '544'
  issues:
  - plumbing: fixed
    ref:
    - id: 28
      cost: 23 USD

- family:
  - home: house
    location: 53rd street|Austin|Texas|U.S
    type: old
original entry: '545'
  issues:
  - plumbing: fixed
    ref:
    - id: 1081
      cost: 33 USD

 ...

This file has hundreds of similar entries on other families.

I want to make it look like this:

- family:
  - home: house
    location: 53rd street|Austin|Texas|U.S
    type: old
original entry: '544'
  issues:
  - plumbing: fixed
    ref:
    - id: 28
      cost: 23 USD
    - id: 1081
      cost: 33 USD

I have tried making a multiline regex where I just find the text in the middle and replace it with nothing. Here is the pattern I attempted:

pattern = "r'\s- family:
\s+- home: house
\s+tag: 53rd street|Austin|Texas|U.S
\s+type: old
\original entry: \'554\'
\s+issues:
\s+- plumbing: fixed
\s+ref:'"

This did not seem to work. I tried one of those online regex tools that suggested:

pattern = "r'\s- family:
\s+- home: house
\s+tag: 53rd street\|Austin\|Texas\|U.S
\s+type: old
\original entry: '554'
\s+issues:
\s+- plumbing: fixed
\s+ref:'"

This also did not appear to work. I have used my multiline regex function on simpler cases without a problem, so I know the regex code itself works. It is just that it seems a bit tricky getting a pattern that works.

I figure there must be some stuff that is not getting escaped correctly, or escaped too much. Also, this strategy does not seem to get both of the original entry numbers after each other.

Is there a way this can be done? I guess one can just use the entire two blocks as the pattern, and the result as the replacement text, but that seems even more bulkier and difficult...

Python: how to do this complex multiline regex involving escapes?

Answers (1)

Related Questions