John Doe
John Doe

Reputation: 677

Replace values in Pandas DataFrame with Unique values from the same DataFrame

I have a pandas DataFrame which has values that are not correct

data = {'Model':['A', 'B', 'A', 'B', 'A'], 'Value':[20, 40, 20, 40, -1]}
df = pd.DataFrame(data)
df

Out[46]: 
  Model  Value
0     A     20
1     B     40
2     A     20
3     B     40
4     A     -1

I would like to replace -1 with the unique values of A. In this case it should be 20.

How do I go about it. I have tried the following. In my case its a large DF with 2million rows.


df2 = df[df.model != -1]
pd.merge(df, df2, on='model', how='left')

Out:
MemoryError: Unable to allocate 5.74 TiB for an array with shape (788568381621,) and data type int64

Upvotes: 0

Views: 48

Answers (2)

CHRD
CHRD

Reputation: 1957

Here's a quick solution:

df['Value'] = df.groupby('Model').transform('max')

Upvotes: 1

Quang Hoang
Quang Hoang

Reputation: 150735

You don't need to merge, which creates all possible pairs of rows with the same Model. The following will do

df['Value'] = df['Value'].mask(df['Value']!=-1).groupby(df['Model']).transform('first')

Or you can also use map:

s = (df[df['Value'] != -1].drop_duplicates('Model')
         .set_index('Model')['Value'])
df['Value'] = df['Model'].map(s)

Upvotes: 1

Related Questions