Reputation: 568
I would like to fill missing value in 2 columns. There are Date and Cat2 should be filled with the value of another row based on the last date for predefined Cat1 (predefined in previous filled rows), for example:
Data Example:
Day Date Cat1 Cat2
1 31/12/17 cat mouse
2 01/09/18 cat mouse
3 27/05/18 dog elephant
4 NaN cat NaN
5 NaN cat NaN
So, NaN
of Date
column should be filled based on the last existed date for cat
values from Cat1
column - 01/09/18
. The NaN
in Cat2
column should be filled by mouse
as defined values in row which has already selected for Date
- 01/09/18
.
I read that looping through each row would be very bad practice and that it would be better to do everything in one way.
I'd appreciate for any idea.Thanks)
I tried to do it by fillna
method, but it fill by last values without condition for Cat1.
data.fillna(method='ffill', inplace = True)
Actual result is:
Day Date Cat1 Cat2
1 31/12/17 cat mouse
2 01/09/18 cat mouse
3 27/05/18 dog elephant
4 27/05/18 cat elephant
5 27/05/18 cat elephant
Expected result should be:
Day Date Cat1 Cat2
1 31/12/17 cat mouse
2 01/09/18 cat mouse
3 27/05/18 dog elephant
4 01/09/18 cat mouse
5 01/09/18 cat mouse
Upvotes: 3
Views: 5236
Reputation: 163
You can use a dict:
dict = df[['cat','cat2']].drop_duplicates(keep='first', inplace=False).to_dict()
df['cat2'] = df['cat1'].replace(dict)
for timeseries-column you can using timedelta (linear extrapolation). good luck!
Upvotes: 0
Reputation: 568
Also, alternative way is:
data = data.groupby(['Cat1'], as_index = False).apply(lambda group: group.ffill())
Thanks, @Jezrael for idea of solution:)
Upvotes: 3
Reputation: 862681
I believe you need GroupBy.ffill
with DataFrame.reindex
for same order like original DataFrame
:
df = df.groupby('Cat1').ffill().reindex(df.columns, axis=1)
print (df)
Day Date Cat1 Cat2
0 1 31/12/17 cat mouse
1 2 01/09/18 cat mouse
2 3 27/05/18 dog elephant
3 4 01/09/18 cat mouse
4 5 01/09/18 cat mouse
Upvotes: 3