Reputation: 11
is there anyway in pandas I can do to do imputation from its last available value for each ID?
ID | Week | Category |
---|---|---|
1001 | 1 | A |
1001 | 2 | A |
1001 | 3 | |
1001 | 4 | B |
1002 | 1 | B |
1002 | 2 | |
1002 | 3 | C |
1002 | 4 |
I would like to transform it to below dataframe
ID | Week | Category |
---|---|---|
1001 | 1 | A |
1001 | 2 | A |
1001 | 3 | A |
1001 | 4 | B |
1002 | 1 | B |
1002 | 2 | B |
1002 | 3 | C |
1002 | 4 | C |
Thanks in advance for the helps.
Upvotes: 1
Views: 850
Reputation: 11395
You have not really described what you wanted, but it looks like you’re looking for ffill()
(or fillna(method='ffill')
). This replaces missing values (na
) with previous elements (hence f
orward fill
).
>>> df
Week Category
ID
1001 1 A
1001 2 A
1001 3 None
1001 4 B
1002 1 B
1002 2 None
1002 3 C
1002 4 None
>>> df['Category'] = df['Category'].ffill()
>>> df
Week Category
ID
1001 1 A
1001 2 A
1001 3 A
1001 4 B
1002 1 B
1002 2 B
1002 3 C
1002 4 C
If your missing values are something else, for example empty strings, you can remove those with .mask()
and then use the same solution.
>>> df['Category'].mask(df['Category'] == '').ffill()
ID
1001 A
1001 A
1001 A
1001 B
1002 B
1002 B
1002 C
1002 C
Name: Category, dtype: object
If you want to limit that per ID, you can use GroupBy.ffill()
.
This only gives different results from the previous solution if you have an empty value at the beginning of an ID. If you don’t want it to be populated from the previous ID, see below:
>>> df
Week Category
ID
1001 1 A
1001 2 A
1001 3 None
1001 4 B
1002 1 None
1002 2 None
1002 3 C
1002 4 None
>>> df.groupby('ID')['Category'].ffill()
ID
1001 A
1001 A
1001 A
1001 B
1002 NaN
1002 NaN
1002 C
1002 C
Name: Category, dtype: object
>>> df['Category'].ffill()
ID
1001 A
1001 A
1001 A
1001 B
1002 B
1002 B
1002 C
1002 C
Name: Category, dtype: object
Upvotes: 3