Reputation: 29
I have a dataset and it has one variable as object data type, i have to convert it to int64 type.
Upvotes: 0
Views: 25374
Reputation: 1
I had the same problem with the same dataset.
There are lots of "?" in the data for the 'bare_nuclei' column (16) of them in the csv itself you need to use the error handling to drop the rows with the ? in the bare_nuclei column, also as a heads up don't name 'class' column class as that's a reserved keyword in python and that's also going to cause problems later.
You can fix this at import using:
missing_values = ["NA","N/a",np.nan,"?"]
l1 = pd.read_csv("../DataSets/Breast cancer dataset/breast-cancer-wisconsin.data",
header=None, na_values=missing_values,
names=['id','clump_thickness','uniformity_of_cell_size',
'uniformity_of_cell_shape', 'marginal_adhesion',
'single_epithelial_cell_size', 'bare_nuclei', 'bland_chromatin',
'normal_nucleoli', 'mitoses', 'diagnosis'])
l1 = l1.dropna()
Upvotes: 0
Reputation: 310
You can try by doing df["Bare Nuclei"].astype(np.int64)
but as far as I can see the problem is something else. Pandas first reads all the data to best estimate the data type for each column, then only makes the data frame. So, there must be some entries in the data frame which are not integer types, i.e., they may contain some letters. In that case, also typecasting should give an error. So you need to remove those entries before successfully making the table integer.
Upvotes: 4