Reputation: 1089
I create a data frame from 11 lists. Four of these lists are lists of ints, while the remaining seven are lists of floats. I create a dataframe from all 11 lists using
df = pd.DataFrame({ col_headers[0] : pd.Series(upper_time, dtype='float'),
col_headers[1] : pd.Series(upper_pres, dtype='float'),
col_headers[2] : pd.Series(upper_indx, dtype='int'),
col_headers[3] : pd.Series(upper_pulses, dtype='int'),
col_headers[4] : pd.Series(median_upper_pulses, dtype='float'),
col_headers[5] : pd.Series(lower_time, dtype='float'),
col_headers[6] : pd.Series(lower_pres, dtype='float'),
col_headers[7] : pd.Series(lower_indx, dtype='int'),
col_headers[8] : pd.Series(lower_pulses, dtype='int'),
col_headers[9] : pd.Series(median_lower_pulses, dtype='float'),
col_headers[10] : pd.Series(median_both_pulses, dtype='float')
})
Unfortunately, when I type df.dtypes. i get
df.dtypes
Upper Systole Time float64
Upper Systole Pressure float64
Upper Systole Index int32
Upper Systole Pulses int32
Median Upper Systolic Pulses float64
Lower Systole Time float64
Lower Systole Pressure float64
Lower Systole Index float64
Lower Systole Pulses float64
Median Lower Systolic Pulses float64
Median Both Systolic Pulses float64
dtype: object
Upper Systole Index, Lower Systole Index, Upper Systole Pulses and Lower Systole Pulses should all be ints (and they are if I check the type of every element in the relevant lists). But somehow, when I create a dataframe, two of the four ints get coerced to floats in spite of my explicit direction to keep them as ints.
I suspect that this has something to do with the fact that lists 0-4 have one length, and lists 5-10 have a different length, but lots of Googling and searching through StackOverflow has not thrown up an answer.
How can I ensure that my ints remain ints?
Upvotes: 1
Views: 633
Reputation: 1089
filippo, Thank you very much - dytpe = 'Int64' with a capital 'I' did the trick. I was unaware of this, and it is nicely written up at https://pandas.pydata.org/pandas-docs/stable/user_guide/missing_data.html, where it is stated that pd.Int64Dtype() is aliased to 'Int64'.
Thanks again
Thomas Philips
Upvotes: 1
Reputation: 1516
If you do the following:
pd.DataFrame({"A":pd.Series([1,2,3,4], dtype='int'),
"B": pd.Series([1,3], dtype='int')}).astype(int)
You will get the following error:
867 if not np.isfinite(arr).all():
--> 868 raise ValueError("Cannot convert non-finite values (NA or inf) to integer")
869
870 elif is_object_dtype(arr):
ValueError: Cannot convert non-finite values (NA or inf) to integer
Which indicates that the issue is the presence of NaNs.
If you were to convert your NaN values to integers, say, 0 for example, then you should be able to coerce the specified columns to integers with .astype(int)
Example:
df = pd.DataFrame({"A":pd.Series([1,2,3,4], dtype='int'),
"B": pd.Series([1,3], dtype='int')})
df["B"] = df["B"].fillna(0).astype(int)
Upvotes: 0