william007
william007

Reputation: 18545

Replace with multi line regex

Given the following text, I want to remove everything in data_augmentation_options{random_horizontal_flip {..}} (... means other text in the following)

i.e., input is :

  ...
  batch_size: 4
  num_steps: 30
  data_augmentation_options {
    random_horizontal_flip {
      keypoint_flip_permutation: 0
      keypoint_flip_permutation: 2
      keypoint_flip_permutation: 1
      keypoint_flip_permutation: 4
      keypoint_flip_permutation: 3
      keypoint_flip_permutation: 6
      keypoint_flip_permutation: 5
      keypoint_flip_permutation: 8
      keypoint_flip_permutation: 7
      keypoint_flip_permutation: 10
      keypoint_flip_permutation: 9
      keypoint_flip_permutation: 12
      keypoint_flip_permutation: 11
      keypoint_flip_permutation: 14
      keypoint_flip_permutation: 13
      keypoint_flip_permutation: 16
      keypoint_flip_permutation: 15
    }
  }

  data_augmentation_options {
    random_crop_image {
      min_aspect_ratio: 0.5
      max_aspect_ratio: 1.7
      random_coef: 0.25
    }
  }

  ...

expected output is:

  ...
  batch_size: 4
  num_steps: 30

  data_augmentation_options {
    random_crop_image {
      min_aspect_ratio: 0.5
      max_aspect_ratio: 1.7
      random_coef: 0.25
    }
  }
  ...

I tried

s=''' ...
      batch_size: 4
      num_steps: 30
      data_augmentation_options {
        random_horizontal_flip {
          keypoint_flip_permutation: 0
          keypoint_flip_permutation: 2
          keypoint_flip_permutation: 1
          keypoint_flip_permutation: 4
          keypoint_flip_permutation: 3
          keypoint_flip_permutation: 6
          keypoint_flip_permutation: 5
          keypoint_flip_permutation: 8
          keypoint_flip_permutation: 7
          keypoint_flip_permutation: 10
          keypoint_flip_permutation: 9
          keypoint_flip_permutation: 12
          keypoint_flip_permutation: 11
          keypoint_flip_permutation: 14
          keypoint_flip_permutation: 13
          keypoint_flip_permutation: 16
          keypoint_flip_permutation: 15
        }
      }
    
      data_augmentation_options {
        random_crop_image {
          min_aspect_ratio: 0.5
          max_aspect_ratio: 1.7
          random_coef: 0.25
        }
      }
      ...
'''
print(re.sub('data_augmentation_options \{[\s]+random_horizontal_flip[\s]+\{[\s]+(keypoint_flip_permutation: \d[\s])+[\s]+\}[\s]+\}','',s,flags=re.S))

It does not seem to work, what's the right way to achieve this?

Upvotes: 0

Views: 69

Answers (3)

The fourth bird
The fourth bird

Reputation: 163352

You are only matching a sinlge line instead of all the lines.

You can repeat the lines for this format keypoint_flip_permutation: \d+ and match the 2 closing curly's

Note that you don't need re.S as there is no dot in the pattern.

data_augmentation_options {\s+random_horizontal_flip\s+{(?:\s+keypoint_flip_permutation: \d+)+\s*}\s*}\s*

Explanation

  • data_augmentation_options { Match literally
  • \s+random_horizontal_flip\s+ match the starting line
  • { Match literally
  • (?: Non capture group
    • \s+keypoint_flip_permutation: \d+ Match the string string followed by 1+ digits
  • )+ Repeat 1+ times
  • \s*} Match optional whitespace chars and }
  • \s*} Match optional whitespace chars and }
  • \s* Match optional whitespace chars

If you want to remove only the trailing newline, you can match \r?\n at the end instead of \s*

Regex demo | Python demo

for example

print(re.sub(r"data_augmentation_options {\s+random_horizontal_flip\s+{(?:\s+keypoint_flip_permutation: \d+)+\s*}\s*}\s*", "", s))

Upvotes: 2

dawg
dawg

Reputation: 103844

You can use a lookahead to stop the deletion at the second pattern:

>>> re.sub(r'^[ \t]*data_augmentation_options[\s\S]+?(?=^[ \t]*data_augmentation_options)','\n\n',s, flags=re.M)

        batch_size: 4
        num_steps: 30
        

        data_augmentation_options {
            random_crop_image {
                min_aspect_ratio: 0.5
                max_aspect_ratio: 1.7
                random_coef: 0.25
            }
        }

The (?=^[ \t]*data_augmentation_options) is the lookahead.

Regex Demo

Upvotes: 0

azro
azro

Reputation: 54148

A few modification to your regex to become

data_augmentation_options {\s+random_horizontal_flip\s+{(\s+keypoint_flip_permutation:\s\d+\s)+\s+}\s+}
  • replace [\s] by just \s, which is equivalent
  • put the \s+ inside the capture group ()
  • replace \d by \d+ to match multi-digits numbers

Upvotes: 0

Related Questions