Reputation: 5515
I am trying to understand how DataFrames work in Julia and I am having a rough time.
I usually worked with DataFrames --in Python-- adding new columns on every simulation step and populating each row with values.
For example, I have this DataFrame which contains input Data:
using DataFrames
df = DataFrame( A=Int[], B=Int[] )
push!(df, [1, 10])
push!(df, [2, 20])
push!(df, [3, 30])
Now, let's say that I do calculations based on those A
and B
columns that generate a third column C
with DateTime objects. But DateTime objects are not generated for all rows, they could be null.
for r in eachrow(df)
?# Pseudocode of what I intend to do
df[! :C] .= nothing
for r in eachrow(df)
if condition
r.C = mySuperComplexFunctionThatReturnsDateTimeForEachRow()
else
r.C = nothing
end
end
To give a runable and concrete code, let's fake condition and function:
df[! :C] .= nothing
for r in eachrow(df)
if r.A == 2
r.C = Dates.now()
else
r.C = nothing
end
end
Upvotes: 3
Views: 2580
Reputation: 69949
The efficient pattern to do this is:
df.C = f.(df.A, df.B)
where f
is a function that takes scalars and calculates an output based on them (i.e. your simulation code) and you pass to it the columns you need to extract from df
to perform the calculations. In this way the Julia compiler will be able to generate fast (type-stable) native code.
In your example the function f
would be ifelse
so you could write:
df.C = ifelse.(df.A .== 2, Dates.now(), nothing)
Also consider if you return nothing
or missing
(they have a different interpretation in Julia: nothing
means absence of a value and missing
means that the value is present but is not known; I am not sure which would be better in your case).
Upvotes: 5
Reputation: 226
If you initialize the column with df[!, :C] .= nothing
it has the element type Nothing
. When writing DateTime
s to this column, Julia is attempting to convert them to Nothing
and fails.
I am not sure if this is the most efficient or recommended solution, but if you initialize the column as a union of DateTime
and Nothing
df[!, :C] = Vector{Union{DateTime, Nothing}}(nothing, size(df, 1))
your example should work.
Upvotes: 4