Valeria
Valeria

Reputation: 1220

Keras W_constraint and W_regularizer analogues in Julia's Flux

I am trying to parse Keras json file to create a Flux model in Julia (Keras v1.1.0 and Flux v0.10.4).

Example of a Dense layer configuration:

{
    "class_name": "Dense", 
    "config": {
        "W_constraint": null, 
        "b_constraint": null, 
        "name": "dense_1", 
        "output_dim": 512, 
        "activity_regularizer": null, 
        "trainable": true, 
        "init": "glorot_normal", 
        "bias": true, 
        "input_dtype": "float32", 
        "input_dim": 4096, 
        "b_regularizer": null, 
        "W_regularizer": {
            "l2": 0.0010000000474974513, 
            "name": "WeightRegularizer", 
            "l1": 0.0
        }, 
        "activation": "relu", 
        "batch_input_shape": [null, 4096]
    }
}

So, it is clear for me how to define input/output dimensions, activation function and parameter initialization in Flux. But what about W_constraint and W_regularizer? I have not found any similar thing in Flux's Dense layer. Does it exist? Should I implement it myself? Are those parameter of the Dense layer even important at all, or can be easily skipped when creating a Flux model without severily altering the performance?

Upvotes: 1

Views: 293

Answers (1)

phipsgabler
phipsgabler

Reputation: 20950

The regularization values are norms that are summed for all parameters of the network and added to the loss function; you have to do that "manually", but it's quite easy and described in the docs.

Parameter constraints in Keras are appearently implemented by using projection methods, which are part of the optimizer. That is less trivial to implement, I suggest reading a bit about proximal gradient methods. You probably have to implement your own optimization type doing that in Flux (ideally, wrapping a one of the existing ones). Maybe ProximalOperators.jl can do some of the heavy lifting. On the other hand, models with parameter constraints are, as far as I have seen, much less common, and you might get away just leaving them unimplemented for now.

Upvotes: 2

Related Questions