
Reputation: 303

Julia 1.5.2 Performance Questions

I am currently attempting to implement a metaheuristic (genetic) algorithm. In this venture i also want to try and create somewhat fast and efficient code. However, my experience in creating efficient coding is not very great. I was therefore wondering if some people could give some "quick tips" to increase the efficiency of my code. I have created a small functional example of my code which contains most of the elements that the code will contain i regards to preallocating arrays, custom mutable structs, random numbers, pushing into arrays etc.

The options that I have already attempted to explore are options in regards to the package "StaticArrays". However many of my arrays must be mutable (so we need MArrays) and many of them will become very large > 100. The documentation of StaticArrays specify that the size of the StaticArrays package must remain small to remain efficient.

According to the documentation Julia 1.5.2 should be thread safe in regards to rand(). I have therefor attempted to multithread for-loops in my functions to make them run faster. And this results in a slight performance increase .

However if people can se a more efficient way of allocating Arrays or pushing in SpotPrices into an array it would be greatly appreciated! Any other performance tips are also very welcome!

# Packages
using DataFrames
using Random
using BenchmarkTools

df = DataFrame( SpotPrice = convert(Array{Float64}, rand(-266:500,8832)),
month = repeat([1,2,3,4,5,6,7,8,9,10,11,12]; outer = 736),
hour = repeat([1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24]; outer = 368))

# Data structure for the prices per hour
mutable struct SpotPrices
    hour :: Array{Float64,1}

# Fill-out data structure
function setup_prices(df::DataFrame)
    prices = []
    for i in 1:length(unique(df[:,3]))
        push!(prices, SpotPrices(filter(row -> row.hour == i, df).SpotPrice))
    return prices

prices = setup_prices(df)

# Sampler function
function MC_Sampler(prices::Vector{Any}, sample_size::Int64)
    # Picking the samples
    tmp = zeros(sample_size, 24)

    # Sampling per hour
    for i in 1:24
        tmp[:,i] = rand(prices[i].hour, sample_size)
    return tmp

samples = MC_Sampler(prices, 100)

@btime setup_prices(df)
@btime MC_Sampler(prices,100)

function setup_prices_par(df::DataFrame)
    prices = []
    @sync Threads.@threads for i in 1:length(unique(df[:,3]))
        push!(prices, SpotPrices(filter(row -> row.hour == i, df).SpotPrice))
    return prices

# Sampler function
function MC_Sampler_par(prices::Vector{Any}, sample_size::Int64)
    # Picking the samples
    tmp = zeros(sample_size, 24)

    # Sampling per hour
    @sync Threads.@threads for i in 1:24
         tmp[:,i] = rand(prices[i].hour, sample_size)
    return tmp

@btime setup_prices_par(df)
@btime MC_Sampler_par(prices,100)

Upvotes: 1

Views: 274

Answers (1)

Przemyslaw Szufel
Przemyslaw Szufel

Reputation: 42254

Have a look at read very carefully

Basic cleanups start with:

  1. Your SpotPrices struct does not need to me mutable. Anyway since there is only one field you could just define it as SpotPrices=Vector{Float64}
  2. You do not want untyped containers - instead of prices = [] do prices = Float64[]
  3. Using DataFrames.groupby will be much faster than finding unique elements and filtering by them
  4. If yo do not need initialze than do not do it Vector{Float64}(undef, sample_size) is much faster than zeros(sample_size, 24)
  5. You do not need to synchronize @sync before a multi-threaded loop
  6. Create a random states - one separate one for each thread and use them whenever calling the rand function

Upvotes: 4

Related Questions