Reputation: 1220
I am writing a script converting Python's Keras
(v1.1.0) model to Julia's Flux
model, and I am struggling with implementing regularization (I have read https://fluxml.ai/Flux.jl/stable/models/regularisation/) as a way to get to know Julia.
So, in Keras
's json model I have something like: "W_regularizer": {"l2": 0.0010000000474974513, "name": "WeightRegularizer", "l1": 0.0}
for each Dense
layer. I want to use these coefficients to create regularization in the Flux
model. The problem is that, in Flux
it is added directly to the loss instead of being defined as a property of the layer itself.
To avoid posting too much code here, I've added it to the repo. Here is a small script that takes the json and createa Flux
's Chain
: https://github.com/iegorval/Keras2Flux.jl/blob/master/Keras2Flux/src/Keras2Flux.jl
Now, I want to create a penalty for each Dense
layer with the predefined l1
/l2
coefficient. I tried to do it like this:
using Pkg
pkg"activate /home/username/.julia/dev/Keras2Flux"
using Flux
using Keras2Flux
using LinearAlgebra
function get_penalty(model::Chain, regs::Array{Any, 1})
index_model = 1
index_regs = 1
penalties = []
for layer in model
if layer isa Dense
println(regs[index_regs](layer.W))
penalty(m) = regs[index_regs](m[index_model].W)
push!(penalties, penalty)
#println(regs[i])
index_regs += 1
end
index_model += 1
end
total_penalty(m) = sum([p(m) for p in penalties])
println(total_penalty)
println(total_penalty(model))
return total_penalty
end
model, regs = convert_keras2flux("examples/keras_1_1_0.json")
penalty = get_penalty(model, regs)
So, I create a penalty function for each Dense
layer and then sum it up to the total penalty. However, it gives me this error:
ERROR: LoadError: BoundsError: attempt to access 3-element Array{Any,1} at index [4]
I understand what it means but I really don't understand how to fix it. So, it seems that when I call total_penalty(model)
, it uses index_regs
== 4 (so, the values of index_regs
and index_model
as they are AFTER the for-cycle). Instead, I want to use their actual indices that I had while pushing the given penalty to the list of penalties.
On the other hand, if I did it not as a list of functions but as a list of values, it also would not be correct, because I will define the loss as:
loss(x, y) = binarycrossentropy(model(x), y) + total_penalty(model)
. If I was to use it just as list of values, then I would have a static total_penalty
, while it should be recalculated for every Dense
layer every time during the model training.
I would be thankful if somebody with Julia experience gives me some advise because I am definitely failing to understand how it works in Julia and, specifically, in Flux
. How would I create total_penalty
that would be recalculated automatically during training?
Upvotes: 1
Views: 296
Reputation: 925
There are a couple parts to your question, and since you are new to Flux (and Julia?), I will answer in steps. But I suggest the solution at the end as a cleaner way to handle this.
First, there is the issue of p(m)
calculating the penalty using index_regs
and index_model
as the values after the for-loop. This is because of the scoping rules in Julia. When you define the closure penalty(m) = regs[index_regs](m[index_model].W)
, index_regs
is bound to the variable defined in get_penalty
. So, as index_regs
changes, so does the output of p(m)
. The other issue is the naming of the function as penalty(m)
. Every time you run this line, you are redefining penalty
and all references to it that you pushed onto penalties
. Instead, you should prefer to create an anonymous function. Here is how we incorporate these changes:
function get_penalty(model::Chain, regs::Array{Any, 1})
index_model = 1
index_regs = 1
penalties = []
for layer in model
if layer isa Dense
println(regs[index_regs](layer.W))
penalty = let i = index_regs, index_model = index_model
m -> regs[i](m[index_model].W)
end
push!(penalties, penalty)
index_regs += 1
end
index_model += 1
end
total_penalty(m) = sum([p(m) for p in penalties])
return total_penalty
end
I used i
and index_model
in the let block to drive home the scoping rules. I'd encourage you to replace the anonymous function in the let block with global penalty(m) = ...
(and remove the assignment to penalty
before the let block) to see the difference of using anonymous vs named functions.
But, if we go back to your original issue, you want to calculate the regularization penalty for your model using the stored coefficients. Ideally, these would be stored with each Dense
layer as in Keras. You can recreate the same functionality in Flux:
using Flux, Functor
struct RegularizedDense{T, LT<:Dense}
layer::LT
w_l1::T
w_l2::T
end
@functor RegularizedDense
(l::RegularizedDense)(x) = l.layer(x)
penalty(l) = 0
penalty(l::RegularizedDense) =
l.w_l1 * norm(l.layer.W, 1) + l.w_l2 * norm(l.layer.W, 2)
penalty(model::Chain) = sum(penalty(layer) for layer in model)
Then, in your Keras2Flux source, you can redefine get_regularization
to return w_l1_reg
and w_l2_reg
instead of functions. And in create_dense
you can do:
function create_dense(config::Dict{String,Any}, prev_out_dim::Int64=-1)
# ... code you have already written
dense = Dense(in, out, activation; initW = init, initb = zeros)
w_l1, w_l2 = get_regularization(config)
return RegularizedDense(dense, w_l1, w_l2)
end
Lastly, you can compute your loss function like so:
loss(x, y, m) = binarycrossentropy(m(x), y) + penalty(m)
# ... later for training
train!((x, y) -> loss(x, y, m), training_data, params)
We define loss
as a function of (x, y, m)
to avoid performance issues.
So, in the end, this approach is cleaner because after model construction, you don't need to pass around an array of regularization functions and figure out how to index each function correctly with the corresponding dense layer.
If you prefer to keep the regularizer and model separate (i.e. have standard Dense
layers in your model chain), then you can do that too. Let me know if you want that solution, but I'll leave it out for now.
Upvotes: 3