Rawia Sammout
Rawia Sammout

Reputation: 221

Imputation of missing value with median

I want to impute a column of a dataframe called Bare Nuclei with a median and I got this error ('must be str, not int', 'occurred at index Bare Nuclei') the following code represents the unique value of the column data['Bare Nuclei]

data['Bare Nuclei'].unique()
array(['1', '10', '2', '4', '3', '9', '7', '?', '5', '8', '6'],
      dtype=object)

Then, I tried to replace ? with nan and then impute nan with median but I got the above error

data['Bare Nuclei'] = data['Bare Nuclei'].replace('?',np.nan)
#data['Bare Nuclei'].fillna()
data.apply(lambda x: x.fillna(x.mean()),axis=0)

To check with the data is available in this link https://archive.ics.uci.edu/ml/machine-learning-databases/breast-cancer-wisconsin/

Upvotes: 2

Views: 6588

Answers (3)

Kerem Ürkmez
Kerem Ürkmez

Reputation: 11

Please check this function if you want to use medians and fill in a little more detailed and realistic.

def groupby_median_imputer(data,features_array,*args):
  #unlimited groups
  from tqdm import tqdm
  print("The numbers of remaining missing values that columns have:")
  for i in tqdm(features_array):
    data[i] = data.groupby([*args])[i].apply(lambda x: x.fillna(x.median()))
    print( i + " : " + data[i].isnull().sum().astype(str)) ```

Upvotes: 0

Rawia Sammout
Rawia Sammout

Reputation: 221

this is the Correction and it works

data['Bare Nuclei'] = data['Bare Nuclei'].replace('?',np.nan).astype(float)
data['Bare Nuclei'] = data['Bare Nuclei'].fillna((data['Bare Nuclei'].median()))

Upvotes: 0

Craig
Craig

Reputation: 4855

The error you got is because the values stored in the 'Bare Nuclei' column are stored as strings, but the mean() function requires numbers. You can see that they are strings in the result of your call to .unique().

After replacing the '?' characters, you can convert the series to numbers using .astype(float):

data['Bare Nuclei'] = data['Bare Nuclei'].replace('?',np.nan)
data['Bare Nuclei'] = data['Bare Nuclei'].astype(float).apply(lambda x: x.fillna(x.mean()))

Upvotes: 1

Related Questions