UGuntupalli
UGuntupalli

Reputation: 859

Add offset while filling nan's in pandas

All,

I am looking for some help with the following problem. I have a way to achieve the desired result, however that requires a loop. So, here is the problem:

import pandas as pd
import numpy as np


# Assumptions:
#  1. The value at minimum index is never np.nan. I have a separate piece of logic that handles it


df = pd.DataFrame(np.random.randint(0,100,size=(15, 1)), columns=list('A'))

# Indices to null
random_indices = np.random.permutation(np.arange(1, 14))[:5]
random_indices = np.sort(random_indices)
df.loc[random_indices, 'A'] = np.nan
df1, df2 = df.copy(deep=True), df.copy(deep=True)

# Approach 1
df1 = df1.fillna(method='ffill')

# Approach 2
for i in random_indices:
    df2.loc[i, 'A'] = df2.loc[i-1, 'A'] + 0.1

print(df1)
print(df2)

Please note that the value at index 0 is never np.nan and is handled separately. Approach 2 gives the desired result, but requires a loop. I would like to achieve the same result using Approach 1 or a similar function. Any help is appreciated.

Upvotes: 0

Views: 258

Answers (1)

Ynjxsjmh
Ynjxsjmh

Reputation: 30050

df1 = df1['A'].fillna(method='ffill') + df1.groupby(df1['A'].ffill()).cumcount()/10

Let's elaborate what df1.groupby(df1['A'].ffill()).cumcount()/10 does.

Take the following dataframe as an example

1
NaN
2
NaN
NaN
3
1
1
2
2
2
3

In this part, if you have some duplicate values before NaN, you can use df1['A'].notnull().cumsum() to replace df1['A'].ffill(). .notnull().cumsum() classifies the NaN and previous one value into same group. While .ffill() classifies the NaN and previous adjacent equal values into same group.

0
1
0
1
2
0

Full program

import pandas as pd
import numpy as np


df = pd.DataFrame(np.random.randint(0,100,size=(15, 1)), columns=list('A'))

random_indices = np.random.permutation(np.arange(1, 14))[:5]
random_indices = np.sort(random_indices)
df.loc[random_indices, 'A'] = np.nan

df1, df2 = df.copy(deep=True), df.copy(deep=True)

df1 = df1['A'].ffill() + df1.groupby(df1['A'].ffill()).cumcount()/10

Upvotes: 2

Related Questions