René
René

Reputation: 4827

pandas DataFrame shows ints as floats

pandas DataFrame shows ints as floats. But I would like to show those ints as ints.

X_train = train.iloc[:, 1:].values.astype('float32')
y_train = train.iloc[:, 0].values.astype('uint8')
X = test.values.astype('float32')

So, the dtypes are 'float32', 'unit8' and 'float32'.

Show min and max values for X_train, y_train and X in a DataFrame (in Jupyter Notebook).

pd.DataFrame([[np.amin(X_train), np.amax(X_train)], 
              [np.amin(y_train), np.amax(y_train)], 
              [np.amin(X), np.amax(X)]], 
             columns = ['min', 'max'], 
             index = ['X_train', 'y_train', 'X'])

Output:

        min max
X_train 0.0 255.0
y_train 0.0 9.0
X       0.0 255.0

But I would expect:

        min max
X_train 0.0 255.0
y_train 0   9
X       0.0 255.0

But...

print(np.amax(y_train))

Outputs to 9 (not 9.0)

Any suggestions?

Upvotes: 3

Views: 4049

Answers (2)

piRSquared
piRSquared

Reputation: 294546

pandas types things by columns. So each column will have a specific dtype. It determines that up-casting the int is better so that the entire column can be float rather than keeping the column as dtype object.

df = pd.DataFrame([
    [0., 255.],
    [0, 9],
    [0., 255.]
])

df

     0      1
0  0.0  255.0
1  0.0    9.0
2  0.0  255.0

df.dtypes

0    float64
1    float64
dtype: object

Use dtype=object to retain the individual types.

df = pd.DataFrame([
    [0., 255.],
    [0, 9],
    [0., 255.]
], dtype=object)

df

   0    1
0  0  255
1  0    9
2  0  255

df.dtypes

0    object
1    object
dtype: object

df.applymap(type)

                 0                1
0  <class 'float'>  <class 'float'>
1    <class 'int'>    <class 'int'>
2  <class 'float'>  <class 'float'>

I'd only use this for reporting purposes. If you want to use this for further calculations, you lose many efficiencies. I'd spend time rearranging your data.

Upvotes: 3

csander
csander

Reputation: 1380

A DataFrame by default consists of a 2D array of elements that are all the same type, because it is backed by a NumPy 2D array. In this case, some of your values are floats so it chooses a floating-point type. If you want to have rows of different types, you'll need to add them as separate series. See this answer for more information.

Upvotes: 1

Related Questions