Reputation: 323
I'm trying to replace the last row in a group by with the value of another column only if it is null. I am able to do both of these pieces separately but can't seem to combine them. Anyone have any ideas?
These are the separate pieces:
# replace any NaN values with values from 'target'
df.loc[df['target'].isnull(),'target'] = df['value']
# replace last value in groupby with value from 'target'
df.loc[df.groupby('id').tail(1).index,'target'] = df['value']
Original Data:
date id value target
0 2020-08-07 id01 0.100775 NaN
1 2020-08-08 id01 0.215885 0.215885
2 2020-08-09 id01 0.012154 0.012154
3 2020-08-10 id01 0.374503 NaN
4 2020-08-07 id02 0.369707 0.369707
5 2020-08-08 id02 0.676743 0.676743
6 2020-08-09 id02 0.659521 0.659521
7 2020-08-10 id02 0.799071 NaN
Replace 'target' column with last row in groupby('id') with what is in 'value':
date id value target
0 2020-08-07 id01 0.100775 NaN
1 2020-08-08 id01 0.215885 0.215885
2 2020-08-09 id01 0.012154 0.012154
3 2020-08-10 id01 0.374503 0.374503
4 2020-08-07 id02 0.369707 0.369707
5 2020-08-08 id02 0.676743 0.676743
6 2020-08-09 id02 0.659521 0.659521
7 2020-08-10 id02 0.799071 0.799071
Upvotes: 4
Views: 563
Reputation: 26676
With combine_first, you are spoiled for choices.
First option
Use .groupby()
with nth(value)
m=df.groupby('id',as_index=False).nth(-1).index#mask each last value in the group
df.loc[m, 'target'] = df['target'].combine_first(df['value'])populate value
Second Option
Use combine_first
using .iloc
accessor to get the last values in the target
column and value
column in a one line groupby
df.groupby('id').apply(lambda x:(x.iloc[-1:,3].combine_first(x.iloc[-1:,2])))\
.reset_index(level=0).combine_first(df)
Third Option
Select the last index in each group. Populate column target as required and update df using combine_first
g=df.groupby('id').apply(lambda x:x.iloc[-1:]).reset_index(level=0, drop=True)
#df.loc[g, 'target'] = df['target'].combine_first(df['value'])
g.target=g.value
g.combine_first(df)
date id value target
0 2020-08-07 id01 0.100775 NaN
1 2020-08-08 id01 0.215885 0.215885
2 2020-08-09 id01 0.012154 0.012154
3 2020-08-10 id01 0.374503 0.374503
4 2020-08-07 id02 0.369707 0.369707
5 2020-08-08 id02 0.676743 0.676743
6 2020-08-09 id02 0.659521 0.659521
7 2020-08-10 id02 0.799071 0.799071
Upvotes: 1
Reputation: 2811
Find the index of the last target inside groupby()
and then replace only null values using .combine_first()
indexes = df.groupby('id').tail(1).index
df.loc[indexes, 'target'] = df['target'].combine_first(df['value'])
#result
date id value target
0 2020-08-07 id01 0.100775 NaN
1 2020-08-08 id01 0.215885 0.215885
2 2020-08-09 id01 0.012154 0.012154
3 2020-08-10 id01 0.374503 0.374503
4 2020-08-07 id02 0.369707 0.369707
5 2020-08-08 id02 0.676743 0.676743
6 2020-08-09 id02 0.659521 0.659521
7 2020-08-10 id02 0.799071 0.799071
Upvotes: 1
Reputation: 59569
fillna
on the entire column but mask back to NaN
if it's a missing that isn't the last for each 'id'.
m = df['target'].isnull() & df['id'].duplicated(keep='last')
df['target'] = df['target'].fillna(df['value']).mask(m)
date id value target
0 2020-08-07 id01 0.100775 NaN
1 2020-08-08 id01 0.215885 0.215885
2 2020-08-09 id01 0.012154 0.012154
3 2020-08-10 id01 0.374503 0.374503
4 2020-08-07 id02 0.369707 0.369707
5 2020-08-08 id02 0.676743 0.676743
6 2020-08-09 id02 0.659521 0.659521
7 2020-08-10 id02 0.799071 0.799071
Upvotes: 3
Reputation: 6132
This should do. Added the tail
variable just for easier to read syntaxis:
tail = df.groupby('id').tail(1)
df.loc[tail.index,'target'] = df.loc[tail.index]['target'].fillna(tail.value)
Output:
0 idx date id value target
1 0 2020-08-07 id01 0.100775 NaN
2 1 2020-08-08 id01 0.215885 0.215885
3 2 2020-08-09 id01 0.012154 0.012154
4 3 2020-08-10 id01 0.374503 0.374503
5 4 2020-08-07 id02 0.369707 0.369707
6 5 2020-08-08 id02 0.676743 0.676743
7 6 2020-08-09 id02 0.659521 0.659521
8 7 2020-08-10 id02 0.799071 0.799071
Upvotes: 3