Reputation: 677
I have a pandas DataFrame which has values that are not correct
data = {'Model':['A', 'B', 'A', 'B', 'A'], 'Value':[20, 40, 20, 40, -1]}
df = pd.DataFrame(data)
df
Out[46]:
Model Value
0 A 20
1 B 40
2 A 20
3 B 40
4 A -1
I would like to replace -1 with the unique values of A. In this case it should be 20.
How do I go about it. I have tried the following. In my case its a large DF with 2million rows.
df2 = df[df.model != -1]
pd.merge(df, df2, on='model', how='left')
Out:
MemoryError: Unable to allocate 5.74 TiB for an array with shape (788568381621,) and data type int64
Upvotes: 0
Views: 48
Reputation: 1957
Here's a quick solution:
df['Value'] = df.groupby('Model').transform('max')
Upvotes: 1
Reputation: 150735
You don't need to merge, which creates all possible pairs of rows with the same Model
. The following will do
df['Value'] = df['Value'].mask(df['Value']!=-1).groupby(df['Model']).transform('first')
Or you can also use map
:
s = (df[df['Value'] != -1].drop_duplicates('Model')
.set_index('Model')['Value'])
df['Value'] = df['Model'].map(s)
Upvotes: 1