Reputation: 17091
Is there any fast way to convert a DataFrame's NA values to the last observed value?
using DataFrames
d = @data [1,NA,5,NA,NA]
df = DataFrame(d=d)
result = filled_with_locf(df)
expected = [1,1,5,5,5]
Upvotes: 5
Views: 993
Reputation: 83
If you are new to Julia and don't understand why Dan Getz's answer worked, check out my explanation in a similar thread.
Upvotes: 1
Reputation: 11
To avoid BoundsError when the first value of the column is missing, use init=1 in accumulate function call.
locf(v) = v[accumulate(max, [i* !(ismissing(v[i])|isnan(v[i])) for i in 1:length(v)], init = 1)]
Upvotes: 1
Reputation: 1305
I wrote this:
This should work too although might need to adjust it for your specific case. This is taking into consideration only positive numbers in the array.
function locf(x::Array{Float64})
dx = zeros(x)
for i in 2:length(x)-1
if x[i+1] > 0 && x[i] == 0.0
dx[i+1] = x[i+1]
end
if dx[i] == 0
dx[i] = dx[i-1]
end
end
return dx
end
na_locf = locf(dummy_array)
Upvotes: 0
Reputation: 18217
Expanding on the comment oneliner, if we define locf
as:
locf(v) = v[cummax([i*!isna(v[i]) for i=1:length(v)])]
Then,
nona_df = DataFrame(Any[locf(df[c]) for c in names(df)],names(df))
And,
julia> nona_df
5×1 DataFrames.DataFrame
│ Row │ d │
├─────┼───┤
│ 1 │ 1 │
│ 2 │ 1 │
│ 3 │ 5 │
│ 4 │ 5 │
│ 5 │ 5 │
Upvotes: 4