Reputation: 6132
I'm working with a dataset, from which a subset has initial values and final values. I created an id
that lets me identify those observations, so after applying this:
df['aux']=df.duplicated(subset=['id'], keep=False)
df_dup=df_dup[df_dup.aux==True]
df_dup.sort_values(by='id').reset_index(inplace=True)
I get something like this:
index id status value
88 1 'initial' 8
95 1 'final' 12
63 2 'initial' 9
52 2 'final' 13
What I want to achieve is to replace the final value in the initial value:
index id status value
88 1 'initial' 12
95 1 'final' 12
63 2 'initial' 13
52 2 'final' 13
I tried several things, my last attempt was this:
df_dup[df_dup.status=='initial'].reset_index().value= \
df_dup[df_dup.status=='final'].reset_index().value
But that fills initial
values with nan
:
index id status value
88 1 'initial' nan
95 1 'final' 12
63 2 'initial' nan
52 2 'final' 13
What am I missing? Thanks
Upvotes: 2
Views: 45
Reputation: 323226
Without groupby
and base on your drop_duplicates
df.value=df.id.map(df.drop_duplicates('id',keep='last').set_index('id').value)
df
Out[436]:
index id status value
0 88 1 'initial' 12
1 95 1 'final' 12
2 63 2 'initial' 13
3 52 2 'final' 13
Upvotes: 2
Reputation: 862641
Use GroupBy.transform
with last
- it also replace unique values of id
, but it return same value:
df['value'] = df.groupby('id')['value'].transform('last')
print (df)
index id status value
0 88 1 'initial' 12
1 95 1 'final' 12
2 63 2 'initial' 13
3 52 2 'final' 13
If want replace only duplicated id
rows (reason many unique values, so better performance):
mask = df.duplicated(subset=['id'], keep=False)
df.loc[mask, 'value'] = df[mask].groupby('id')['value'].transform('last')
Upvotes: 2