Reputation: 191
Stackoverflow's visitors.
I have a slice of a dataset, the values of which I wish to replace with another value. For example:
data_train[data_train.Fare < 6]['Fare']
Shows this output:
179 0.0000
263 0.0000
271 0.0000
277 0.0000
302 0.0000
378 4.0125
413 0.0000
466 0.0000
481 0.0000
597 0.0000
633 0.0000
674 0.0000
732 0.0000
806 0.0000
815 0.0000
822 0.0000
872 5.0000
Besides, I use the for
cycle to replace the 0-values in all datasets. And the first iteration in this cycle should replace the data_train
. However, the output remains the same (0 only).
for dataset in [data_train, data_test]:
lower_margin = 6 if 'Survived' in dataset else 3
classes = dataset[dataset.Fare < lower_margin]['Pclass'].unique()
for i in classes:
dataset[dataset.Fare < lower_margin].loc[dataset.Pclass == i]['Fare'].replace(0.0000, round(dataset[dataset.Pclass == i]['Fare'].mean(),4), inplace=True)
I've tried to reassign the replaced Series, but it hasn't worked too.
dataset[dataset.Fare < lower_margin].loc[dataset.Pclass == i]['Fare'] = dataset[dataset.Fare < lower_margin].loc[dataset.Pclass == i]['Fare'].replace(0.0000, round(dataset[dataset.Pclass == i]['Fare'].mean(),4))
Might be I missed something, but I do not aware what exactly.
Update
The expected output, shall look like this:
179 84.1234
263 84.1234
271 84.1234
277 84.1234
302 84.1234
378 84.1234
413 84.1234
466 84.1234
481 84.1234
597 84.1234
633 84.1234
674 84.1234
732 84.1234
806 84.1234
815 84.1234
822 84.1234
872 84.1234
where
round(dataset[dataset.Pclass == i]['Fare'].mean(),4) == 84.1234
Note: the mean is fluctuating from Pclass to Pclass, but I've simplified by defined the mean as a constant.
Upvotes: 1
Views: 34
Reputation: 862511
I believe you need:
print (data_train)
Fare Pclass
179 2.5000 1
263 0.0000 1
271 30.0000 2
277 20.0000 2
302 0.0000 3
378 4.0125 3
out = []
for dataset in [data_train, data_test]:
lower_margin = 6 if 'Survived' in dataset else 3
#filters
m1 = dataset.Fare < lower_margin
m2 = dataset.Fare == 0
#filtering DataFrame by treshold and aggregate mean
avg = dataset[m1].groupby('Pclass')['Fare'].mean()
#replaced only 0 values by mapped averages
dataset.loc[m2, 'Fare'] = dataset.loc[m2, 'Pclass'].map(avg)
out.append(dataset)
print (out[0])
Fare Pclass
179 2.5000 1
263 1.2500 1 <-correct replaced by mean
271 30.0000 2
277 20.0000 2
302 0.0000 3
378 4.0125 3
Upvotes: 1