Reputation: 4827
pandas DataFrame shows ints as floats. But I would like to show those ints as ints.
X_train = train.iloc[:, 1:].values.astype('float32')
y_train = train.iloc[:, 0].values.astype('uint8')
X = test.values.astype('float32')
So, the dtypes are 'float32', 'unit8' and 'float32'.
Show min and max values for X_train, y_train and X in a DataFrame (in Jupyter Notebook).
pd.DataFrame([[np.amin(X_train), np.amax(X_train)],
[np.amin(y_train), np.amax(y_train)],
[np.amin(X), np.amax(X)]],
columns = ['min', 'max'],
index = ['X_train', 'y_train', 'X'])
Output:
min max
X_train 0.0 255.0
y_train 0.0 9.0
X 0.0 255.0
But I would expect:
min max
X_train 0.0 255.0
y_train 0 9
X 0.0 255.0
But...
print(np.amax(y_train))
Outputs to 9 (not 9.0)
Any suggestions?
Upvotes: 3
Views: 4049
Reputation: 294546
pandas
types things by columns. So each column will have a specific dtype
. It determines that up-casting the int
is better so that the entire column can be float
rather than keeping the column as dtype object
.
df = pd.DataFrame([
[0., 255.],
[0, 9],
[0., 255.]
])
df
0 1
0 0.0 255.0
1 0.0 9.0
2 0.0 255.0
df.dtypes
0 float64
1 float64
dtype: object
Use dtype=object
to retain the individual types.
df = pd.DataFrame([
[0., 255.],
[0, 9],
[0., 255.]
], dtype=object)
df
0 1
0 0 255
1 0 9
2 0 255
df.dtypes
0 object
1 object
dtype: object
df.applymap(type)
0 1
0 <class 'float'> <class 'float'>
1 <class 'int'> <class 'int'>
2 <class 'float'> <class 'float'>
I'd only use this for reporting purposes. If you want to use this for further calculations, you lose many efficiencies. I'd spend time rearranging your data.
Upvotes: 3
Reputation: 1380
A DataFrame
by default consists of a 2D array of elements that are all the same type, because it is backed by a NumPy 2D array. In this case, some of your values are floats so it chooses a floating-point type. If you want to have rows of different types, you'll need to add them as separate series. See this answer for more information.
Upvotes: 1