BAR
BAR

Reputation: 17091

Julia DataFrame Fill NA with LOCF

Is there any fast way to convert a DataFrame's NA values to the last observed value?

using DataFrames

d = @data [1,NA,5,NA,NA]
df = DataFrame(d=d)

result = filled_with_locf(df)

expected = [1,1,5,5,5]

Upvotes: 5

Views: 993

Answers (4)

Hongtao Hao
Hongtao Hao

Reputation: 83

If you are new to Julia and don't understand why Dan Getz's answer worked, check out my explanation in a similar thread.

Upvotes: 1

btsays
btsays

Reputation: 11

To avoid BoundsError when the first value of the column is missing, use init=1 in accumulate function call.

locf(v) = v[accumulate(max, [i* !(ismissing(v[i])|isnan(v[i])) for i in 1:length(v)], init = 1)]

Upvotes: 1

Andrew Bannerman
Andrew Bannerman

Reputation: 1305

I wrote this:

This should work too although might need to adjust it for your specific case. This is taking into consideration only positive numbers in the array.

function locf(x::Array{Float64})
dx = zeros(x)
for i in 2:length(x)-1
    if x[i+1] > 0 && x[i] == 0.0
        dx[i+1] = x[i+1]
    end
        if dx[i] == 0 
            dx[i] = dx[i-1]
        end
    end
    return dx
end

    na_locf = locf(dummy_array)

Upvotes: 0

Dan Getz
Dan Getz

Reputation: 18217

Expanding on the comment oneliner, if we define locf as:

locf(v) = v[cummax([i*!isna(v[i]) for i=1:length(v)])]

Then,

nona_df = DataFrame(Any[locf(df[c]) for c in names(df)],names(df))

And,

julia> nona_df
5×1 DataFrames.DataFrame
│ Row │ d │
├─────┼───┤
│ 1   │ 1 │
│ 2   │ 1 │
│ 3   │ 5 │
│ 4   │ 5 │
│ 5   │ 5 │

Upvotes: 4

Related Questions