Hossein
Hossein

Reputation: 25924

How can I get this regex right in c#?

I am trying to match any blocks that has type:"Data" in it and then replace it with the text I want.
A sample input is given below, there can be one or more of these:

layer {
  name: "cifar"
  type: "Data"
  top: "data"
  top: "label"
  include {
    phase: TRAIN
  }
  transform_param {
    mean_file: "examples/cifar10/mean.binaryproto"
    mirror: true
    #crop_size: 20 
  }

# this is a comment!
  data_param {
    source: "examples/cifar10/cifar10_train_lmdb"
    batch_size: 100
    backend: LMDB
  }
}
layer {
  name: "cifar"
  type: "Data"
  top: "data"
  top: "label"
  include {
    phase: TEST
  }
  transform_param {
    mean_file: "examples/cifar10/mean.binaryproto"
  }
  data_param {
    source: "examples/cifar10/cifar10_test_lmdb"
    batch_size: 25
    backend: LMDB
  }
}

I came up with this regex :

((layer)( *)((\n))*{((.*?)(\n)*)*(type)( *):( *)("Data")((.*?)(\n)*)*)(.*?)(\n)}

I tried to model this :

find and select a block starting with layer, 
there can be any number of space characters but after it 
there should be a { character, 
then there can be anything( for making it easier), and then 
there should be a type followed by any number of spaces, then followed by "Data"
then anything can be there, until it is faced with a } character 

But clearly this does not work properly. If I change the type in any of these layer blocks, nothing gets detected!, not even the layer which has the type : "Data"

Upvotes: 2

Views: 99

Answers (1)

Robin Mackenzie
Robin Mackenzie

Reputation: 19319

Based on this post about using .net regular expressions to do bracket matching you can adapt the regex presented:

\((?>\((?<c>)|[^()]+|\)(?<-c>))*(?(c)(?!))\)

It's looking for sets of matching ( and ) and you can simply swap those for { and } (nothing that they are escaped in that regex).

Then you can prefix the layer\s* bit.

For the feature to exclude blocks where type <> "Data" I've added a negative lookahead for all the other type keywords in your sample in the pastebin. Unfortunately adding a postitive lookahead for type: "Data" simply didn't work and I think if it did that would be your most robust solution.

Hopefully you have a finite list of type values and you can extend this for a practical solution:

layer\s*{(?>{(?<c>)|[^{}](?!type: "Accuracy"|type: "Convolution"|type: "Dropout"|type: "InnerProduct"|type: "LRN"|type: "Pooling"|type: "ReLU"|type: "SoftmaxWithLoss")+|}(?<-c>))*(?(c)(?!))}

The key bit to work with in the original regex is the [^()]+ which matches content between the brackets that are being matched by the other components of the regex. I've adapted that to [^{}]+ - being 'everything other than the braces' - and then added the long 'apart from' clause with the keywords to not match.

Upvotes: 1

Related Questions