Valeria
Valeria

Reputation: 1220

Julia Flux: writing a regularizer depending on the provided regularization coefficients

I am writing a script converting Python's Keras (v1.1.0) model to Julia's Flux model, and I am struggling with implementing regularization (I have read https://fluxml.ai/Flux.jl/stable/models/regularisation/) as a way to get to know Julia.

So, in Keras's json model I have something like: "W_regularizer": {"l2": 0.0010000000474974513, "name": "WeightRegularizer", "l1": 0.0} for each Dense layer. I want to use these coefficients to create regularization in the Flux model. The problem is that, in Flux it is added directly to the loss instead of being defined as a property of the layer itself.

To avoid posting too much code here, I've added it to the repo. Here is a small script that takes the json and createa Flux's Chain: https://github.com/iegorval/Keras2Flux.jl/blob/master/Keras2Flux/src/Keras2Flux.jl

Now, I want to create a penalty for each Dense layer with the predefined l1/l2 coefficient. I tried to do it like this:

using Pkg
pkg"activate /home/username/.julia/dev/Keras2Flux"

using Flux
using Keras2Flux
using LinearAlgebra

function get_penalty(model::Chain, regs::Array{Any, 1})
    index_model = 1
    index_regs = 1
    penalties = []
    for layer in model
        if layer isa Dense
            println(regs[index_regs](layer.W))   
            penalty(m) = regs[index_regs](m[index_model].W)
            push!(penalties, penalty)
            #println(regs[i])
            index_regs += 1
        end
        index_model += 1
    end
    total_penalty(m) = sum([p(m) for p in penalties])
    println(total_penalty)
    println(total_penalty(model))
    return total_penalty
end

model, regs = convert_keras2flux("examples/keras_1_1_0.json")
penalty = get_penalty(model, regs)

So, I create a penalty function for each Dense layer and then sum it up to the total penalty. However, it gives me this error: ERROR: LoadError: BoundsError: attempt to access 3-element Array{Any,1} at index [4]

I understand what it means but I really don't understand how to fix it. So, it seems that when I call total_penalty(model), it uses index_regs == 4 (so, the values of index_regs and index_model as they are AFTER the for-cycle). Instead, I want to use their actual indices that I had while pushing the given penalty to the list of penalties.

On the other hand, if I did it not as a list of functions but as a list of values, it also would not be correct, because I will define the loss as: loss(x, y) = binarycrossentropy(model(x), y) + total_penalty(model). If I was to use it just as list of values, then I would have a static total_penalty, while it should be recalculated for every Dense layer every time during the model training.

I would be thankful if somebody with Julia experience gives me some advise because I am definitely failing to understand how it works in Julia and, specifically, in Flux. How would I create total_penalty that would be recalculated automatically during training?

Upvotes: 1

Views: 296

Answers (1)

darsnack
darsnack

Reputation: 925

There are a couple parts to your question, and since you are new to Flux (and Julia?), I will answer in steps. But I suggest the solution at the end as a cleaner way to handle this.

First, there is the issue of p(m) calculating the penalty using index_regs and index_model as the values after the for-loop. This is because of the scoping rules in Julia. When you define the closure penalty(m) = regs[index_regs](m[index_model].W), index_regs is bound to the variable defined in get_penalty. So, as index_regs changes, so does the output of p(m). The other issue is the naming of the function as penalty(m). Every time you run this line, you are redefining penalty and all references to it that you pushed onto penalties. Instead, you should prefer to create an anonymous function. Here is how we incorporate these changes:

function get_penalty(model::Chain, regs::Array{Any, 1})
    index_model = 1
    index_regs = 1
    penalties = []
    for layer in model
        if layer isa Dense
            println(regs[index_regs](layer.W))   
            penalty = let i = index_regs, index_model = index_model
                m -> regs[i](m[index_model].W)
            end
            push!(penalties, penalty)
            index_regs += 1
        end
        index_model += 1
    end
    total_penalty(m) = sum([p(m) for p in penalties])
    return total_penalty
end

I used i and index_model in the let block to drive home the scoping rules. I'd encourage you to replace the anonymous function in the let block with global penalty(m) = ... (and remove the assignment to penalty before the let block) to see the difference of using anonymous vs named functions.


But, if we go back to your original issue, you want to calculate the regularization penalty for your model using the stored coefficients. Ideally, these would be stored with each Dense layer as in Keras. You can recreate the same functionality in Flux:

using Flux, Functor

struct RegularizedDense{T, LT<:Dense}
    layer::LT
    w_l1::T
    w_l2::T
end

@functor RegularizedDense

(l::RegularizedDense)(x) = l.layer(x)

penalty(l) = 0
penalty(l::RegularizedDense) =
  l.w_l1 * norm(l.layer.W, 1) + l.w_l2 * norm(l.layer.W, 2)
penalty(model::Chain) = sum(penalty(layer) for layer in model)

Then, in your Keras2Flux source, you can redefine get_regularization to return w_l1_reg and w_l2_reg instead of functions. And in create_dense you can do:

function create_dense(config::Dict{String,Any}, prev_out_dim::Int64=-1)
    # ... code you have already written
    dense = Dense(in, out, activation; initW = init, initb = zeros)
    w_l1, w_l2 = get_regularization(config)
    return RegularizedDense(dense, w_l1, w_l2)
end

Lastly, you can compute your loss function like so:

loss(x, y, m) = binarycrossentropy(m(x), y) + penalty(m)
# ... later for training
train!((x, y) -> loss(x, y, m), training_data, params)

We define loss as a function of (x, y, m) to avoid performance issues.

So, in the end, this approach is cleaner because after model construction, you don't need to pass around an array of regularization functions and figure out how to index each function correctly with the corresponding dense layer.

If you prefer to keep the regularizer and model separate (i.e. have standard Dense layers in your model chain), then you can do that too. Let me know if you want that solution, but I'll leave it out for now.

Upvotes: 3

Related Questions