pythondumb
pythondumb

Reputation: 1207

Pandas Dataframe update the row values by previous one based on condition

I have a dataframe as follows. I want to replace the row values based on the following condition.

if 3 consecutive previous row values are 0, then keep the values as it is or if only one previous row value is 0, then fill that row by rolling mean for last 3 rows for that particular IEMI.

First the dataframe (df).

IMEI        KVA          KwH            Expected KVA
55647     1307.65       1020.33            1307.65
55468     2988.00       1109.05            2988.00
55647     0.00          977.87             0.00
55467     0.00          1388.25            0.00
55647     0.00          445.37             0.00
55469     1888.97       933.48             1888.97
55647     1338.65       1120.33            1338.65
55468     2088.00       1019.05            2088.00
55647     0.00          977.87             =(1307.65+0.00+1338.65)/3=882.1
55469     1455.28       1388.25            1455.28
55648     2144.38       445.37             2144.38
55469     1888.97       933.48             1888.97

Please observe the calculation of Expected KVA column. Since the actual KVA value at 9th row is 0, the value of Expected KVA is simple average of last three observation (KVA) for IEMI 55647. But in 3rd, 4th and 5th row since 3 KVA values for IEMI 55647 is 0, therefore the Expected KVA value is also 0.

Edit Edit - Further Below is a pseudo code snippet. I have reset_index. Still It may not work

df = df.sort_values(['IMEI'],ascending=0).reset_index(drop=True)
for i in range(0,len(df)):
     if df.loc[i:i+2,'KVA']==0:
          df['Expected KVA'] = df['KVA']
     else:
        df['Expected KVA']=df.groupby('IEMI')['KVA'].rolling(min_periods=1,center=True,window=3).mean()

I believe that the above code snippet may fail as I have sorted IEMI thus index got distorted.

How can it be achieved? A for loop coupled with groupbyfunction? Any thought on this?

Upvotes: 0

Views: 925

Answers (1)

anky
anky

Reputation: 75080

Below is what i came up with:(I have added 3 extra rows with IMEI : 55674 just for testing)

Removing consecutive 0s with a group of 3 (which needs no action) and slicing on the dataframe:

import itertools
def consecutive(data, stepsize=1):
    return np.split(data, np.where(np.diff(data) != stepsize)[0]+1)

a = np.array(df[df.KVA == 0.00].index)
l = consecutive(a)
to_exclude=list(itertools.chain.from_iterable([i.tolist() for i in l if len(i)==3]))
pd.options.mode.chained_assignment = None
df1 = df.loc[~df.index.isin(to_exclude)]
>>df1
    IMEI    KVA     KwH
0   55647   1307.65 1020.33
1   55468   2988.00 1109.05
5   55469   1888.97 933.48
6   55647   1338.65 1120.33
7   55468   2088.00 1019.05
8   55647   0.00    977.87
9   55469   1455.28 1388.25
10  55648   2144.38 445.37
11  55469   1888.97 933.48
12  55674   0.00    6433.00
13  55674   1345.00 6542.00
14  55674   3456.00 6541.00

Assigning the leftover 0s with np.nan and doing a groupby with transform and fillna with the mean

df1['KVA'] = df1['KVA'].replace(0, np.nan)
df1['KVA'] = df1['KVA'].fillna(df1.fillna(0).groupby(['IMEI'])['KVA'].transform('mean'))
>>df1
    IMEI    KVA          KwH
0   55647   1307.650000 1020.33
1   55468   2988.000000 1109.05
5   55469   1888.970000 933.48
6   55647   1338.650000 1120.33
7   55468   2088.000000 1019.05
8   55647   882.100000  977.87
9   55469   1455.280000 1388.25
10  55648   2144.380000 445.37
11  55469   1888.970000 933.48
12  55674   1600.333333 6433.00
13  55674   1345.000000 6542.00
14  55674   3456.000000 6541.00

Then just concat and sort_index those which we had left out earlier:

pd.concat([df1,df.loc[df.index.isin(to_exclude)]]).sort_index()

    IMEI    KVA         KwH
0   55647   1307.650000 1020.33
1   55468   2988.000000 1109.05
2   55647   0.000000    977.87
3   55467   0.000000    1388.25
4   55647   0.000000    445.37
5   55469   1888.970000 933.48
6   55647   1338.650000 1120.33
7   55468   2088.000000 1019.05
8   55647   882.100000  977.87
9   55469   1455.280000 1388.25
10  55648   2144.380000 445.37
11  55469   1888.970000 933.48
12  55674   1600.333333 6433.00
13  55674   1345.000000 6542.00
14  55674   3456.000000 6541.00

Upvotes: 2

Related Questions