JuiceFV
JuiceFV

Reputation: 191

Pandas replace doesn't replace column's values

Stackoverflow's visitors.

I have a slice of a dataset, the values of which I wish to replace with another value. For example:

data_train[data_train.Fare < 6]['Fare']

Shows this output:

179    0.0000
263    0.0000
271    0.0000
277    0.0000
302    0.0000
378    4.0125
413    0.0000
466    0.0000
481    0.0000
597    0.0000
633    0.0000
674    0.0000
732    0.0000
806    0.0000
815    0.0000
822    0.0000
872    5.0000

Besides, I use the for cycle to replace the 0-values in all datasets. And the first iteration in this cycle should replace the data_train. However, the output remains the same (0 only).

for dataset in [data_train, data_test]:
    lower_margin = 6 if 'Survived' in dataset else 3
    
    classes = dataset[dataset.Fare < lower_margin]['Pclass'].unique()
    for i in classes:
        dataset[dataset.Fare < lower_margin].loc[dataset.Pclass == i]['Fare'].replace(0.0000, round(dataset[dataset.Pclass == i]['Fare'].mean(),4), inplace=True) 

I've tried to reassign the replaced Series, but it hasn't worked too.

dataset[dataset.Fare < lower_margin].loc[dataset.Pclass == i]['Fare'] = dataset[dataset.Fare < lower_margin].loc[dataset.Pclass == i]['Fare'].replace(0.0000, round(dataset[dataset.Pclass == i]['Fare'].mean(),4)) 

Might be I missed something, but I do not aware what exactly.

Update

The expected output, shall look like this:

179    84.1234
263    84.1234
271    84.1234
277    84.1234
302    84.1234
378    84.1234
413    84.1234
466    84.1234
481    84.1234
597    84.1234
633    84.1234
674    84.1234
732    84.1234
806    84.1234
815    84.1234
822    84.1234
872    84.1234

where

round(dataset[dataset.Pclass == i]['Fare'].mean(),4) == 84.1234

Note: the mean is fluctuating from Pclass to Pclass, but I've simplified by defined the mean as a constant.

Upvotes: 1

Views: 34

Answers (1)

jezrael
jezrael

Reputation: 862511

I believe you need:

print (data_train)
        Fare  Pclass
179   2.5000       1
263   0.0000       1
271  30.0000       2
277  20.0000       2
302   0.0000       3
378   4.0125       3

out = []
for dataset in [data_train, data_test]:
    lower_margin = 6 if 'Survived' in dataset else 3
    
    #filters
    m1 = dataset.Fare < lower_margin
    m2 = dataset.Fare == 0
    
    #filtering DataFrame by treshold and aggregate mean
    avg = dataset[m1].groupby('Pclass')['Fare'].mean()
    #replaced only 0 values by mapped averages
    dataset.loc[m2, 'Fare'] = dataset.loc[m2, 'Pclass'].map(avg)
    out.append(dataset)
    
print (out[0])
        Fare  Pclass
179   2.5000       1
263   1.2500       1 <-correct replaced by mean
271  30.0000       2
277  20.0000       2
302   0.0000       3
378   4.0125       3

Upvotes: 1

Related Questions