Rosand Liu
Rosand Liu

Reputation: 409

when I set value in dataframe(pandas) there is error: 'Series' objects are mutable, thus they cannot be hashed

I want to change value in pandas DataFrame by condition that data[Bare Nuclei'] != '?'

import pandas as pd
import numpy as np
column_names = ['Sample code number', 'Clump Thickness', 
                'Uniformity of Cell Size', 'Uniformity of Cell Shape',
                'Marginal Adhesion', 'Single Epithelial Cell Size',
                'Bare Nuclei', 'Bland Chromatin', 'Normal Nucleoli',
                'Mitoses', 'Class']
data = pd.read_csv('https://archive.ics.uci.edu/ml/machine-learning-databases/breast-cancer-wisconsin/breast-cancer-wisconsin.data', names = column_names )
mean = 0
n = 0
for index,row in data.iterrows():
    if row['Bare Nuclei'] != '?':
        n += 1
        mean += int(row['Bare Nuclei'])
mean = mean / n
temp = data
index = temp['Bare Nuclei'] == '?'
temp[index,'Bare Nuclei'] = mean

this is jupyter notebook give me error: enter image description here

I want to know how to change value in dataframe and why my way is wrong? Could you help me, I look forward your help!!

Upvotes: 1

Views: 5098

Answers (2)

jezrael
jezrael

Reputation: 862751

For last line add DataFrame.loc, because need change column of DataFrame:

temp.loc[index,'Bare Nuclei'] = mean

But in pandas is the best avoid loops, because slow. So better solution is replace ? to NaNs and then fillna by means:

data['Bare Nuclei'] = data['Bare Nuclei'].replace('?', np.nan).astype(float)
#more general
#data['Bare Nuclei'] = pd.to_numeric(data['Bare Nuclei'], errors='coerce')
data['Bare Nuclei'] = data['Bare Nuclei'].fillna(data['Bare Nuclei'].mean())

Alternative solution:

mask = data['Bare Nuclei'] == '?'
data['Bare Nuclei'] = data['Bare Nuclei'].mask(mask).astype(float)
data['Bare Nuclei'] = data['Bare Nuclei'].fillna(data['Bare Nuclei'].mean())

Verify solution:

column_names = ['Sample code number', 'Clump Thickness', 
                'Uniformity of Cell Size', 'Uniformity of Cell Shape',
                'Marginal Adhesion', 'Single Epithelial Cell Size',
                'Bare Nuclei', 'Bland Chromatin', 'Normal Nucleoli',
                'Mitoses', 'Class']
data = pd.read_csv('https://archive.ics.uci.edu/ml/machine-learning-databases/breast-cancer-wisconsin/breast-cancer-wisconsin.data', names = column_names )
#print (data.head())

#get index values by condition
L = data.index[data['Bare Nuclei'] == '?'].tolist()
print (L)
[23, 40, 139, 145, 158, 164, 235, 249, 275, 292, 294, 297, 315, 321, 411, 617]

#get mean of values converted to numeric
print (data['Bare Nuclei'].replace('?', np.nan).astype(float).mean())
3.5446559297218156

print (data.loc[L, 'Bare Nuclei'])
23     ?
40     ?
139    ?
145    ?
158    ?
164    ?
235    ?
249    ?
275    ?
292    ?
294    ?
297    ?
315    ?
321    ?
411    ?
617    ?
Name: Bare Nuclei, dtype: object

#convert to numeric - replace `?` to NaN and cast to float
data['Bare Nuclei'] = data['Bare Nuclei'].replace('?', np.nan).astype(float)
#more general
#data['Bare Nuclei'] = pd.to_numeric(data['Bare Nuclei'], errors='coerce')
#replace NaNs by means
data['Bare Nuclei'] = data['Bare Nuclei'].fillna(data['Bare Nuclei'].mean())

#verify replacing
print (data.loc[L, 'Bare Nuclei'])
23     3.544656
40     3.544656
139    3.544656
145    3.544656
158    3.544656
164    3.544656
235    3.544656
249    3.544656
275    3.544656
292    3.544656
294    3.544656
297    3.544656
315    3.544656
321    3.544656
411    3.544656
617    3.544656
Name: Bare Nuclei, dtype: float64

Upvotes: 2

Kurumi Tokisaki
Kurumi Tokisaki

Reputation: 171

temp[index,'Bare Nuclei'] is a mix of boolean indexing and column selection using label which will not work. Instead, change

index = temp['Bare Nuclei'] == '?'
temp[index,'Bare Nuclei'] = mean

to

s=temp['Bare Nuclei']
temp['Bare Nuclei']=s.where(s!='?',mean)

where(s!='?',mean) actually means change the value of the element to 'mean' where the condition s!='?' does not meet (kind of confusion at first glance)

Upvotes: 1

Related Questions