Cindy
Cindy

Reputation: 568

Python - fill NA by value from previous rows based on identifier column

I would like to fill missing value in 2 columns. There are Date and Cat2 should be filled with the value of another row based on the last date for predefined Cat1 (predefined in previous filled rows), for example:

Data Example:

Day  Date      Cat1      Cat2
1    31/12/17  cat       mouse
2    01/09/18  cat       mouse
3    27/05/18  dog       elephant
4       NaN    cat       NaN
5       NaN    cat       NaN

So, NaN of Date column should be filled based on the last existed date for cat values from Cat1 column - 01/09/18. The NaN in Cat2 column should be filled by mouse as defined values in row which has already selected for Date - 01/09/18.

I read that looping through each row would be very bad practice and that it would be better to do everything in one way.

I'd appreciate for any idea.Thanks)

I tried to do it by fillna method, but it fill by last values without condition for Cat1.

data.fillna(method='ffill', inplace = True)

Actual result is:

Day  Date      Cat1      Cat2
1    31/12/17  cat       mouse
2    01/09/18  cat       mouse
3    27/05/18  dog       elephant
4    27/05/18  cat       elephant
5    27/05/18  cat       elephant

Expected result should be:

Day  Date      Cat1      Cat2
1    31/12/17  cat       mouse
2    01/09/18  cat       mouse
3    27/05/18  dog       elephant
4    01/09/18  cat       mouse
5    01/09/18  cat       mouse

Upvotes: 3

Views: 5236

Answers (3)

Alexandr  Kosolapov
Alexandr Kosolapov

Reputation: 163

You can use a dict:

dict = df[['cat','cat2']].drop_duplicates(keep='first', inplace=False).to_dict()
df['cat2'] = df['cat1'].replace(dict)

for timeseries-column you can using timedelta (linear extrapolation). good luck!

Upvotes: 0

Cindy
Cindy

Reputation: 568

Also, alternative way is:

data = data.groupby(['Cat1'], as_index = False).apply(lambda group: group.ffill())

Thanks, @Jezrael for idea of solution:)

Upvotes: 3

jezrael
jezrael

Reputation: 862681

I believe you need GroupBy.ffill with DataFrame.reindex for same order like original DataFrame:

df = df.groupby('Cat1').ffill().reindex(df.columns, axis=1)
print (df)
   Day      Date Cat1      Cat2
0    1  31/12/17  cat     mouse
1    2  01/09/18  cat     mouse
2    3  27/05/18  dog  elephant
3    4  01/09/18  cat     mouse
4    5  01/09/18  cat     mouse

Upvotes: 3

Related Questions