Reputation: 25924
I am trying to match any blocks that has type:"Data"
in it and then replace it with the text I want.
A sample input is given below, there can be one or more of these:
layer {
name: "cifar"
type: "Data"
top: "data"
top: "label"
include {
phase: TRAIN
}
transform_param {
mean_file: "examples/cifar10/mean.binaryproto"
mirror: true
#crop_size: 20
}
# this is a comment!
data_param {
source: "examples/cifar10/cifar10_train_lmdb"
batch_size: 100
backend: LMDB
}
}
layer {
name: "cifar"
type: "Data"
top: "data"
top: "label"
include {
phase: TEST
}
transform_param {
mean_file: "examples/cifar10/mean.binaryproto"
}
data_param {
source: "examples/cifar10/cifar10_test_lmdb"
batch_size: 25
backend: LMDB
}
}
I came up with this regex :
((layer)( *)((\n))*{((.*?)(\n)*)*(type)( *):( *)("Data")((.*?)(\n)*)*)(.*?)(\n)}
I tried to model this :
find and select a block starting with layer,
there can be any number of space characters but after it
there should be a { character,
then there can be anything( for making it easier), and then
there should be a type followed by any number of spaces, then followed by "Data"
then anything can be there, until it is faced with a } character
But clearly this does not work properly. If I change the type in any of these layer blocks, nothing gets detected!, not even the layer which has the type : "Data"
Upvotes: 2
Views: 99
Reputation: 19319
Based on this post about using .net regular expressions to do bracket matching you can adapt the regex presented:
\((?>\((?<c>)|[^()]+|\)(?<-c>))*(?(c)(?!))\)
It's looking for sets of matching (
and )
and you can simply swap those for {
and }
(nothing that they are escaped in that regex).
Then you can prefix the layer\s*
bit.
For the feature to exclude blocks where type
<> "Data"
I've added a negative lookahead for all the other type
keywords in your sample in the pastebin. Unfortunately adding a postitive lookahead for type: "Data"
simply didn't work and I think if it did that would be your most robust solution.
Hopefully you have a finite list of type
values and you can extend this for a practical solution:
layer\s*{(?>{(?<c>)|[^{}](?!type: "Accuracy"|type: "Convolution"|type: "Dropout"|type: "InnerProduct"|type: "LRN"|type: "Pooling"|type: "ReLU"|type: "SoftmaxWithLoss")+|}(?<-c>))*(?(c)(?!))}
The key bit to work with in the original regex is the [^()]+
which matches content between the brackets that are being matched by the other components of the regex. I've adapted that to [^{}]+
- being 'everything other than the braces' - and then added the long 'apart from' clause with the keywords to not match.
Upvotes: 1