Reputation: 18545
Given the following text, I want to remove everything in data_augmentation_options{random_horizontal_flip {..}}
(...
means other text in the following)
i.e., input is :
...
batch_size: 4
num_steps: 30
data_augmentation_options {
random_horizontal_flip {
keypoint_flip_permutation: 0
keypoint_flip_permutation: 2
keypoint_flip_permutation: 1
keypoint_flip_permutation: 4
keypoint_flip_permutation: 3
keypoint_flip_permutation: 6
keypoint_flip_permutation: 5
keypoint_flip_permutation: 8
keypoint_flip_permutation: 7
keypoint_flip_permutation: 10
keypoint_flip_permutation: 9
keypoint_flip_permutation: 12
keypoint_flip_permutation: 11
keypoint_flip_permutation: 14
keypoint_flip_permutation: 13
keypoint_flip_permutation: 16
keypoint_flip_permutation: 15
}
}
data_augmentation_options {
random_crop_image {
min_aspect_ratio: 0.5
max_aspect_ratio: 1.7
random_coef: 0.25
}
}
...
expected output is:
...
batch_size: 4
num_steps: 30
data_augmentation_options {
random_crop_image {
min_aspect_ratio: 0.5
max_aspect_ratio: 1.7
random_coef: 0.25
}
}
...
I tried
s=''' ...
batch_size: 4
num_steps: 30
data_augmentation_options {
random_horizontal_flip {
keypoint_flip_permutation: 0
keypoint_flip_permutation: 2
keypoint_flip_permutation: 1
keypoint_flip_permutation: 4
keypoint_flip_permutation: 3
keypoint_flip_permutation: 6
keypoint_flip_permutation: 5
keypoint_flip_permutation: 8
keypoint_flip_permutation: 7
keypoint_flip_permutation: 10
keypoint_flip_permutation: 9
keypoint_flip_permutation: 12
keypoint_flip_permutation: 11
keypoint_flip_permutation: 14
keypoint_flip_permutation: 13
keypoint_flip_permutation: 16
keypoint_flip_permutation: 15
}
}
data_augmentation_options {
random_crop_image {
min_aspect_ratio: 0.5
max_aspect_ratio: 1.7
random_coef: 0.25
}
}
...
'''
print(re.sub('data_augmentation_options \{[\s]+random_horizontal_flip[\s]+\{[\s]+(keypoint_flip_permutation: \d[\s])+[\s]+\}[\s]+\}','',s,flags=re.S))
It does not seem to work, what's the right way to achieve this?
Upvotes: 0
Views: 69
Reputation: 163352
You are only matching a sinlge line instead of all the lines.
You can repeat the lines for this format keypoint_flip_permutation: \d+
and match the 2 closing curly's
Note that you don't need re.S
as there is no dot in the pattern.
data_augmentation_options {\s+random_horizontal_flip\s+{(?:\s+keypoint_flip_permutation: \d+)+\s*}\s*}\s*
Explanation
data_augmentation_options {
Match literally\s+random_horizontal_flip\s+
match the starting line{
Match literally(?:
Non capture group
\s+keypoint_flip_permutation: \d+
Match the string string followed by 1+ digits)+
Repeat 1+ times\s*}
Match optional whitespace chars and }
\s*}
Match optional whitespace chars and }
\s*
Match optional whitespace charsIf you want to remove only the trailing newline, you can match \r?\n
at the end instead of \s*
for example
print(re.sub(r"data_augmentation_options {\s+random_horizontal_flip\s+{(?:\s+keypoint_flip_permutation: \d+)+\s*}\s*}\s*", "", s))
Upvotes: 2
Reputation: 103844
You can use a lookahead to stop the deletion at the second pattern:
>>> re.sub(r'^[ \t]*data_augmentation_options[\s\S]+?(?=^[ \t]*data_augmentation_options)','\n\n',s, flags=re.M)
batch_size: 4
num_steps: 30
data_augmentation_options {
random_crop_image {
min_aspect_ratio: 0.5
max_aspect_ratio: 1.7
random_coef: 0.25
}
}
The (?=^[ \t]*data_augmentation_options)
is the lookahead.
Upvotes: 0
Reputation: 54148
A few modification to your regex to become
data_augmentation_options {\s+random_horizontal_flip\s+{(\s+keypoint_flip_permutation:\s\d+\s)+\s+}\s+}
[\s]
by just \s
, which is equivalent\s+
inside the capture group ()
\d
by \d+
to match multi-digits numbersUpvotes: 0