JAKE
JAKE

Reputation: 263

Replace missing values with previous values in Julia Data Frame

Imagine I have a data frame like below:

enter image description here

What I want to do is to fill those missing values with previous values, so after fillling the data frame would be like:

enter image description here

Is there any simple way that I can do this?

Upvotes: 3

Views: 804

Answers (3)

sprmnt21
sprmnt21

Reputation: 1


df = DataFrame(dt1=[missing, 0.2, missing, missing, 1, missing, 5, 6],
                      dt2=[9, 0.3, missing, missing, 3, missing, 5, 6])

filldown(v)=accumulate((x,y)->coalesce(y,x), v,init=v[1])

transform(df,[:dt1,:dt2].=>filldown,renamecols=false)


julia> transform(df,[:dt1,:dt2].=>filldown,renamecols=false) 
8×2 DataFrame
 Row │ dt1        dt2     
     │ Float64?   Float64 
─────┼────────────────────
   1 │ missing        9.0
   2 │       0.2      0.3
   3 │       0.2      0.3
   4 │       0.2      0.3
   5 │       1.0      3.0
   6 │       1.0      3.0
   7 │       5.0      5.0
   8 │       6.0      6.0

fillup(v)=reverse(filldown(reverse(v)))

transform(df,[:dt2,:dt1].=>[filldown,fillup],renamecols=false)

Upvotes: 0

loki
loki

Reputation: 10350

Using a quick for loop would do the trick. Maybe it is useful to wrap into a function

using DataFrames
df = DataFrame(a = [1, missing, 2], b = [3, missing, 4])

previous = 0
for c in 1:ncol(df), r in 1:nrow(df)
    if !ismissing(df[r, c])
        previous = df[r,c]
    else
        df[r,c] = previous
    end
end

julia> df
3×2 DataFrame
│ Row │ a      │ b      │
│     │ Int64? │ Int64? │
├─────┼────────┼────────┤
│ 1   │ 1      │ 3      │
│ 2   │ 1      │ 3      │
│ 3   │ 2      │ 4      │

Upvotes: 1

Bogumił Kamiński
Bogumił Kamiński

Reputation: 69949

This is the way to do it using Impute.jl:

julia> using Impute, DataFrames

julia> df = DataFrame(dt1=[0.2, missing, missing, 1, missing, 5, 6],
                      dt2=[0.3, missing, missing, 3, missing, 5, 6])
7×2 DataFrame
 Row │ dt1        dt2
     │ Float64?   Float64?
─────┼──────────────────────
   1 │       0.2        0.3
   2 │ missing    missing
   3 │ missing    missing
   4 │       1.0        3.0
   5 │ missing    missing
   6 │       5.0        5.0
   7 │       6.0        6.0

julia> transform(df, names(df) .=> Impute.locf, renamecols=false)
7×2 DataFrame
 Row │ dt1       dt2
     │ Float64?  Float64?
─────┼────────────────────
   1 │      0.2       0.3
   2 │      0.2       0.3
   3 │      0.2       0.3
   4 │      1.0       3.0
   5 │      1.0       3.0
   6 │      5.0       5.0
   7 │      6.0       6.0

Upvotes: 3

Related Questions