JSowa
JSowa

Reputation: 10572

Missing values replaced with average of its neighbors (timeseries)

I want all missing values from dataset to replace with average of two nearest neighbors. Except of first and last cells and when neighbors are 0 (then I manually fix values). I coded this and it works, but the solution is not very smart. Is is another way to do it faster? Interpolate method is suitable for that? I'm not quite sure how does it work.

Input:

         0       1       2       3       4       5
0      0.0  1596.0  1578.0  1567.0  1580.0  1649.0
1   1554.0  1506.0     0.0  1466.0  1469.0  1503.0
2   1588.0  1510.0  1495.0  1485.0  1489.0     0.0
3   1592.0     0.0     0.0  1571.0  1647.0     0.0

Output:

         0       1       2       3       4       5
0      0.0  1596.0  1578.0  1567.0  1580.0  1649.0
1   1554.0  1506.0  1486.0  1466.0  1469.0  1503.0
2   1588.0  1510.0  1495.0  1485.0  1489.0  1540.5
3   1592.0     0.0     0.0  1571.0  1647.0     0.0

Code:

data_len = len(df)
first_col = str(df.columns[0])
last_col = str(df.columns[len(df.columns) - 1])

d = df.apply(lambda s: pd.to_numeric(s, errors="coerce"))
m = d.eq(0) | d.isna()
s = m.stack()
list = s[s].index.tolist() #list of indeces of missing values
count = len(list)

for el in list:
    if (el == ('0', first_col) or el == (str(data_len - 1), last_col)):
        continue
    next = df.at[str(int(el[0]) + 1), first_col] if el[1] == last_col else df.at[el[0], str(int(el[1]) + 1)]
    prev = df.at[str(int(el[0]) - 1), last_col] if el[1] == first_col else df.at[el[0], str(int(el[1]) - 1)]
    if prev == 0 or next == 0:
        continue
    df.at[el[0],el[1]] = (prev + next)/2

JSON of example:

{"0":{"0":0.0,"1":1554.0,"2":1588.0,"3":0.0},"1":{"0":1596.0,"1":1506.0,"2":1510.0,"3":0.0},"2":{"0":1578.0,"1":0.0,"2":1495.0,"3":1561.0},"3":{"0":1567.0,"1":1466.0,"2":1485.0,"3":1571.0},"4":{"0":1580.0,"1":1469.0,"2":1489.0,"3":1647.0},"5":{"0":1649.0,"1":1503.0,"2":0.0,"3":0.0}}

Upvotes: 1

Views: 269

Answers (1)

yatu
yatu

Reputation: 88226

Here's one approach using shift to average the neighbour's values and slice assigning back to the dataframe:

m = df==0
r = (df.shift(axis=1)+df.shift(-1,axis=1))/2
df.iloc[1:-1,1:-1] = df.mask(m,r)

print(df)

        0       1       2       3       4       5
0     0.0  1596.0  1578.0  1567.0  1580.0  1649.0
1  1554.0  1506.0  1486.0  1466.0  1469.0  1503.0
2  1588.0  1510.0  1495.0  1485.0  1489.0     0.0
3     0.0     0.0  1561.0  1571.0  1647.0     0.0

Upvotes: 1

Related Questions