gabboshow
gabboshow

Reputation: 5569

avoid automatic convertion to float python

I have the following parameters:

param_grid = dict(par1 = [0.1, 1.1, 1.2], 
                  par2 = [3, 4, 5], 
                  par3 = [6, 7, 8])   

I would like to create a table with all the possible combination of parameters. I tried with the following code

hyperParamSpace = pd.DataFrame([row for row in itertools.product(*param_grid.values())], 
                               columns=param_grid.keys())

When I take the first combination with hyperParamSpace.iloc[1] it converts all the parameters in floats:

par3    6.0
par2    3.0
par1    1.1
Name: 1, dtype: float64

How can I keep the integer as integer type?

Upvotes: 1

Views: 607

Answers (1)

Sergey Antopolskiy
Sergey Antopolskiy

Reputation: 4290

The reason it does that is because each column of the DataFrame in pandas is essentially a numpy array. The elements of the array must be all of the same type, otherwise it loses a lot of its computational advantages. Therefore, if one of the elements in a column is a float, it will automatically convert all of the elements to floats.

You can control dtype of the array, and by extension, the DataFrame, manually and set it to int, but you will lose your floats in this case.

However, in your example elements of the columns with ints are actually of the type int64 (you can verify it by running hyperParamSpace.par2.dtype), but when you slice a row with iloc, it converts them to floats in the output, because of the same principle: to create an array, where all elements have same type.

What you can do to avoid the conversion is to specify dtype of your DataFrame as object:

hyperParamSpace = pd.DataFrame([row for row in itertools.product(*param_grid.values())],
                                columns=param_grid.keys(), dtype=object)

This will drastically decrease the efficiency, but since your parameter table is small, it shouldn't be a problem.

Upvotes: 4

Related Questions