Wen Kai Yeam
Wen Kai Yeam

Reputation: 11

How to replace missing data with last available value in Pandas

is there anyway in pandas I can do to do imputation from its last available value for each ID?

ID Week Category
1001 1 A
1001 2 A
1001 3
1001 4 B
1002 1 B
1002 2
1002 3 C
1002 4

I would like to transform it to below dataframe

ID Week Category
1001 1 A
1001 2 A
1001 3 A
1001 4 B
1002 1 B
1002 2 B
1002 3 C
1002 4 C

Thanks in advance for the helps.

Upvotes: 1

Views: 850

Answers (1)

Cimbali
Cimbali

Reputation: 11395

You have not really described what you wanted, but it looks like you’re looking for ffill() (or fillna(method='ffill')). This replaces missing values (na) with previous elements (hence forward fill).

>>> df
      Week Category
ID                 
1001     1        A
1001     2        A
1001     3     None
1001     4        B
1002     1        B
1002     2     None
1002     3        C
1002     4     None
>>> df['Category'] = df['Category'].ffill()
>>> df
      Week Category
ID                 
1001     1        A
1001     2        A
1001     3        A
1001     4        B
1002     1        B
1002     2        B
1002     3        C
1002     4        C

If your missing values are something else, for example empty strings, you can remove those with .mask() and then use the same solution.

>>> df['Category'].mask(df['Category'] == '').ffill()
ID
1001    A
1001    A
1001    A
1001    B
1002    B
1002    B
1002    C
1002    C
Name: Category, dtype: object

If you want to limit that per ID, you can use GroupBy.ffill().

This only gives different results from the previous solution if you have an empty value at the beginning of an ID. If you don’t want it to be populated from the previous ID, see below:

>>> df
      Week Category
ID                 
1001     1        A
1001     2        A
1001     3     None
1001     4        B
1002     1     None
1002     2     None
1002     3        C
1002     4     None
>>> df.groupby('ID')['Category'].ffill()
ID
1001      A
1001      A
1001      A
1001      B
1002    NaN
1002    NaN
1002      C
1002      C
Name: Category, dtype: object
>>> df['Category'].ffill()
ID
1001    A
1001    A
1001    A
1001    B
1002    B
1002    B
1002    C
1002    C
Name: Category, dtype: object

Upvotes: 3

Related Questions