Erik
Erik

Reputation: 3208

Trying to write a softmax and NNLib softmax giving unexpected output

I am working through a python book.. but using Julialang instead.. in order to learn the language etc... and I have come upon another area here where I am not quite clear ..

but when i start tossing more complex matrices it fell apart..

include("activation_function_exercise/spiral_data.jl")
include("activation_function_exercise/dense_layer.jl")
include("activation_function_exercise/activation_relu.jl")
include("activation_function_exercise/activation_softmax.jl")

coords, color = spiral_data(100, 3)

dense1 = LayerDense(2,3)
dense2 = LayerDense(3,3)

forward(dense1, coords)
println("Forward 1 layer")
activated_output = relu_activation(dense1.output)
forward(dense2, activated_output)
println("Forward 2 layer")
activated_output2 = softmax_activation(dense2.output)

println("\n", activated_output2)

I get a proper matrix back

julia> activated_output2
300×3 Matrix{Float64}:
 0.00333346  0.00333337  0.00333335
 0.00333345  0.00333337  0.00333335
 0.00333345  0.00333336  0.00333335
 0.00333344  0.00333336  0.00333335
 0.00333343  0.00333336  0.00333334
 0.00333311  0.00333321  0.00333322

but the book has

>>>
[[0.33333 0.3333 0.3333]
...

Seems I am an order of magnitude lower than the book? even when using FluxMLs softmax function

EDIT:

I thought maybe my ReLU activation code was causing the discrepancy.. and tried switching to the FluxML NNlib version... but get same activated_output2 with 0.0033333 instead of 0.333333

will keep checking other parts like my forward function

EDIT2:

Adding my DenseLayer implementation for completeness

DenseLayer

# see https://github.com/FluxML/Flux.jl/blob/b78a27b01c9629099adb059a98657b995760b617/src/layers/basic.jl#L71-L111
using Base: Integer, Float64

mutable struct LayerDense
    weights::Matrix{Float64}
    biases::Matrix{Float64}
    num_inputs::Integer
    num_neurons::Integer
    output::Matrix{Float64}
    LayerDense(num_inputs::Integer, num_neurons::Integer) = new(0.01 * randn(num_inputs, num_neurons), zeros((1, num_neurons)),num_inputs, num_neurons)
end


function forward(layer::LayerDense, inputs::Matrix{Float64})
    layer.output = inputs * layer.weights .+ layer.biases
end

EDIT3:

Using the library.. I started inspecting my spiral_data implementation.. seems within reason

Python

import numpy as np
import nnfs

from nnfs.datasets import spiral_data

nnfs.init()


X, y = spiral_data(samples=100, classes=3)

print(X[:4]). # just check the first couple

>>>
[[0.         0.        ]
 [0.00299556 0.00964661]
 [0.01288097 0.01556285]
 [0.02997479 0.0044481 ]]

JuliaLang

include("activation_function_exercise/spiral_data.jl")

coords, color = spiral_data(100, 3)

julia> coords
300×2 Matrix{Float64}:
  0.0         0.0
 -0.00133462  0.0100125
  0.00346739  0.0199022
 -0.00126302  0.0302767
  0.00184948  0.0403617
  0.0113095   0.0492225
  0.0397276   0.0457691
  0.0144484   0.0692151
  0.0181726   0.0787382
  0.0320308   0.0850793

Upvotes: 1

Views: 133

Answers (1)

Erik
Erik

Reputation: 3208

turned out I was using the NNlib softmax on the entire matrix.. which the python book was NOT doing.. and all in needed to do was to modify my softmax() call likeso

using NNlib

function softmax_activation(inputs)
    return softmax(inputs, dims=2)
end

Then the output at the end of my big long example comes out as expected

#using Pkg
#Pkg.add("Plots")

include("activation_function_exercise/spiral_data.jl")
include("activation_function_exercise/dense_layer.jl")
include("activation_function_exercise/activation_relu.jl")
include("activation_function_exercise/activation_softmax.jl")

coords, color = spiral_data(100, 3)

dense1 = LayerDense(2,3)
dense2 = LayerDense(3,3)

# Julia doesn't lend itself to OO programming...
# so the following will just be function
# activation1 = activation_relu
# activation2 = activation_softmax

forward(dense1, coords)
activated_output = relu_activation(dense1.output)
forward(dense2, activated_output)
activated_output2 = softmax_activation(dense2.output)


using Plots

#scatter(coords[:,1], coords[:,2])
scatter(coords[:,1], coords[:,2], zcolor=color, framestyle=:box)

display(activated_output2)

300×3 Matrix{Float64}:
 0.333333  0.333333  0.333333
 0.333336  0.333334  0.33333
 0.333338  0.333339  0.333323
 0.33334   0.333344  0.333316
 0.333339  0.333361  0.3333
 0.333341  0.333365  0.333294
 0.333345  0.333362  0.333293
 0.333345  0.333374  0.333281
 0.333349  0.33337   0.333281
 0.333347  0.33339   0.333262
 ⋮                   
 0.333564  0.332673  0.333764
 0.333583  0.332885  0.333532
 0.333588  0.332967  0.333445
 0.333587  0.333148  0.333265
 0.333593  0.332935  0.333472
 0.333596  0.333006  0.333398
 0.333583  0.33333   0.333086
 0.3336    0.333062  0.333338
 0.333603  0.333082  0.333316

Upvotes: 1

Related Questions