Timothy Chu
Timothy Chu

Reputation: 163

FluxML Julia Linear Regression Question: How to train Model Properly?

I'm looking at the fluxml Julia tutorial here: https://fluxml.ai/getting_started.html . I've attached the un-edited code snippet below. It aims to train a simple linear regression model.

What is the code tagged "#Training process" doing? In particular, it seems that if x = [1,2,3,4,5] and y = [0,0.1], then 'd' in the for loop is (1, 0), (2,0.1), (1, 0), (2, 0.1) which seems like it can't be what this tutorial is aiming to accomplish. If this is incorrect, how should the code be written?


#Import Flux
using Flux

#Create some train data
x = rand(5)
y = rand(2) 

#Define your model
model(x) = W*x .+ b

#Set initial random weights for your model
W = rand(2, 5)
b = rand(2)

#Define a loss function
function loss(x, y)
    ŷ = model(x)
    sum((y .- ŷ).^2)
  end

#Set an optimiser
opt = Descent(0.1)

#Zip the train data
data = zip(x, y)

# Track the derivatives of W and b
ps = params([W, b])

# Training process
for d in data
  gs = Flux.gradient(ps) do
    loss(d...)
  end
  Flux.Optimise.update!(opt, ps, gs)
end

# Execute one training step using the train! function
Flux.train!(loss, params(model), data, opt)

Upvotes: 0

Views: 151

Answers (1)

chipbuster
chipbuster

Reputation: 121

if x = [1,2,3,4,5] and y = [0,0.1] then 'd' in the for loop is (1, 0), (2,0.1), (1, 0), (2, 0.1)

Almost, but not quite. Unlike in python, operations in Julia do not automatically broadcast to try to fill dimensions. zip will simply create an output corresponding to the shorter of its two inputs. In your example, since y is the shorter input, you would have d == [(1, 0.0), (2, 0.1)].

This is not what the authors intended to do (a PR which will fix this has been opened). They meant to write a loop so that d goes over all pairs of training samples--in this case, all one of them. There are a few ways to write this: the simplest is just to change data = zip(x,y) to data = [(x,y)].

Upvotes: 2

Related Questions