Casey
Casey

Reputation: 161

Iteratively populate dataframes using a for loop in Julia

I am looking to find a way to iteratively populate a dataframe in Julia.

I have a working function that creates multiple points along a line:

#function to draw QMD lines
using DataFrames
function make_lines(qmd)
    BA=Float64[]
    TPA=Float64[]
    QMD=Int[]
    for i in stk_percent
        tpa= 1*(i*10)/(a[1]+a[2]*(-0.259+0.973*qmd)+a[3]*qmd^2)
        ba=pi*(qmd/24)^2*tpa
        push!(TPA,tpa)
        push!(BA,ba)
        push!(QMD,qmd)
    end
    return DataFrame(TPA=TPA,BA=BA,QMD=QMD)
end

The next step I am trying to accomplish is to run the make_lines function in a loop using a pre-defined set of inputs with all the outputs in one single dataframe but I cannot get it to work.

dia = [7, 8, 10, 12, 14, 16, 18, 20, 22]

# can't get for loop to append all the data frames?
for i in dia
  df=DataFrame(TPA=Float64[],BA=Float64[],QMD=Int[])
  append!(df,make_lines(i))
return df
end

At first I thought it was how I was using Dataframes, I have never used Push! etc before but I got this code chunk to work

#this works to combine dataframe
test=make_lines(22)
test2=make_lines(8)
test[:]
append!(test,test2)

So why when I run the for loop, do I end up with only the last dataframe it produces?

Am I misinterpreting something? From what I have read Dataframes in Julia work differently than dataframes in R, but I cannot wrap my head around how to get this working.

Upvotes: 1

Views: 2585

Answers (2)

Ramiro Canchucaja
Ramiro Canchucaja

Reputation: 11

I managed to create a blank dataframe by providing the type of variable and the column names

df = DataFrame([DateTime;fill(Float64, 2);String;fill(Float64, 2)],
["Date","A","B","Letter","C","D"])

Then I can append the results to populate the new dataframe by using rename! and then append! functions inside the for loop. This is very useful for large datasets with numerous columns.

Upvotes: 0

Michael Ohlrogge
Michael Ohlrogge

Reputation: 10990

You are pretty close, but there are a couple of places where you are getting tripped up in your code. You currently have:

dia = [7, 8, 10, 12, 14, 16, 18, 20, 22]

# can't get for loop to append all the data frames?
for i in dia
  df=DataFrame(TPA=Float64[],BA=Float64[],QMD=Int[])
  append!(df,make_lines(i))
return df
end

This isn't quite what you want for two reasons:

One: This snippet isn't a function. It thus doesn't make sense, and will cause problems, to have return in it.

Two: At each step in your loop, you are re-creating your dataframe df from scratch, erasing everything that you put before it. This is why, as you say, you only end up with the last data frame that it produces. Instead, you would want something like:

dia = [7, 8, 10, 12, 14, 16, 18, 20, 22]

df=DataFrame(TPA=Float64[],BA=Float64[],QMD=Int[])
for i in dia
  append!(df,make_lines(i))
end

Note: I couldn't get a completely working version of your code going - the objects stk_percent and a in your main function never get defined, so I didn't really know what to put in for those. But, I believe that if you fix these issues you'll likely be in a better spot (I made up some values for them and it worked fine).

Performance Tip: When you do fix those, my recommendation would be to make them as explicit arguments that you pass to your function. Although it will still work if they are just variables in the global space, this will lead to suboptimal performance of your code, both now and in the future, and potentially worse things, like confusing the scope of variables, having their values change when you don't want, etc. Best to start off from the beginning of your journey with Julia adopting as many best practices in writing your code as is practicable.

Upvotes: 3

Related Questions