Add offset while filling nan's in pandas

Question

All,

I am looking for some help with the following problem. I have a way to achieve the desired result, however that requires a loop. So, here is the problem:

import pandas as pd
import numpy as np


# Assumptions:
#  1. The value at minimum index is never np.nan. I have a separate piece of logic that handles it


df = pd.DataFrame(np.random.randint(0,100,size=(15, 1)), columns=list('A'))

# Indices to null
random_indices = np.random.permutation(np.arange(1, 14))[:5]
random_indices = np.sort(random_indices)
df.loc[random_indices, 'A'] = np.nan
df1, df2 = df.copy(deep=True), df.copy(deep=True)

# Approach 1
df1 = df1.fillna(method='ffill')

# Approach 2
for i in random_indices:
    df2.loc[i, 'A'] = df2.loc[i-1, 'A'] + 0.1

print(df1)
print(df2)

Please note that the value at index 0 is never np.nan and is handled separately. Approach 2 gives the desired result, but requires a loop. I would like to achieve the same result using Approach 1 or a similar function. Any help is appreciated.

Ynjxsjmh · Accepted Answer

df1 = df1['A'].fillna(method='ffill') + df1.groupby(df1['A'].ffill()).cumcount()/10

Let's elaborate what df1.groupby(df1['A'].ffill()).cumcount()/10 does.

Take the following dataframe as an example

1
NaN
2
NaN
NaN
3

pandas.DataFrame.ffill() propagates last valid observation forward to next valid. df1['A'].ffill() would be

In this part, if you have some duplicate values before NaN, you can use df1['A'].notnull().cumsum() to replace df1['A'].ffill(). .notnull().cumsum() classifies the NaN and previous one value into same group. While .ffill() classifies the NaN and previous adjacent equal values into same group.

pandas.DataFrame.groupby() can take the Series to determine the groups. By using the result of df1['A'].ffill(), the 1st and 2nd rows are in the same group, the 3rd, 4th, 5th rows are in the same group.
pandas.core.groupby.GroupBy.cumcount() numbers each item in each group from 0 to the length of that group - 1.

Full program

import pandas as pd
import numpy as np


df = pd.DataFrame(np.random.randint(0,100,size=(15, 1)), columns=list('A'))

random_indices = np.random.permutation(np.arange(1, 14))[:5]
random_indices = np.sort(random_indices)
df.loc[random_indices, 'A'] = np.nan

df1, df2 = df.copy(deep=True), df.copy(deep=True)

df1 = df1['A'].ffill() + df1.groupby(df1['A'].ffill()).cumcount()/10

Add offset while filling nan's in pandas

Answers (1)

Related Questions

Add offset while filling nan&#39;s in pandas

Answers (1)

Related Questions

Add offset while filling nan's in pandas