Juan C
Juan C

Reputation: 6132

Replace values based on index pandas

I'm working with a dataset, from which a subset has initial values and final values. I created an id that lets me identify those observations, so after applying this:

df['aux']=df.duplicated(subset=['id'], keep=False)

df_dup=df_dup[df_dup.aux==True]
df_dup.sort_values(by='id').reset_index(inplace=True)

I get something like this:

index     id    status    value
  88      1    'initial'    8
  95      1    'final'      12
  63      2    'initial'    9
  52      2    'final'      13

What I want to achieve is to replace the final value in the initial value:

index     id    status    value
  88      1    'initial'    12
  95      1    'final'      12
  63      2    'initial'    13
  52      2    'final'      13

I tried several things, my last attempt was this:

df_dup[df_dup.status=='initial'].reset_index().value= \
df_dup[df_dup.status=='final'].reset_index().value

But that fills initial values with nan:

index     id    status    value
  88      1    'initial'    nan
  95      1    'final'      12
  63      2    'initial'    nan
  52      2    'final'      13

What am I missing? Thanks

Upvotes: 2

Views: 45

Answers (2)

BENY
BENY

Reputation: 323226

Without groupby and base on your drop_duplicates

df.value=df.id.map(df.drop_duplicates('id',keep='last').set_index('id').value)
df
Out[436]: 
   index  id     status  value
0     88   1  'initial'     12
1     95   1    'final'     12
2     63   2  'initial'     13
3     52   2    'final'     13

Upvotes: 2

jezrael
jezrael

Reputation: 862641

Use GroupBy.transform with last - it also replace unique values of id, but it return same value:

df['value'] = df.groupby('id')['value'].transform('last')
print (df)
   index  id     status  value
0     88   1  'initial'     12
1     95   1    'final'     12
2     63   2  'initial'     13
3     52   2    'final'     13

If want replace only duplicated id rows (reason many unique values, so better performance):

mask = df.duplicated(subset=['id'], keep=False)
df.loc[mask, 'value'] = df[mask].groupby('id')['value'].transform('last')

Upvotes: 2

Related Questions