Reputation: 742
I am going around in circles and tried so many different ways so I guess my core understanding is wrong. I would be grateful for help in understanding my encoding/decoding issues.
I import the dataframe from SQL and it seems that some datatypes:float64 are converted to Object. Thus, I cannot do any calculation. I fail to convert the Object back to float64.
df.head()
Date WD Manpower 2nd CTR 2ndU T1 T2 T3 T4
2013/4/6 6 NaN 2,645 5.27% 0.29 407 533 454 368
2013/4/7 7 NaN 2,118 5.89% 0.31 257 659 583 369
2013/4/13 6 NaN 2,470 5.38% 0.29 354 531 473 383
2013/4/14 7 NaN 2,033 6.77% 0.37 396 748 681 458
2013/4/20 6 NaN 2,690 5.38% 0.29 361 528 541 381
df.dtypes
WD float64
Manpower float64
2nd object
CTR object
2ndU float64
T1 object
T2 object
T3 object
T4 object
T5 object
dtype: object
SQL table:
Upvotes: 39
Views: 222063
Reputation: 409
For me, the problem lies in the table column definition. My Oracle table definition has columns like below:
SQL> desc abc
mval NUMBER(38,15),
sval NUMBER(38,15),
nval NUMBER(38)
My ipython SQL has to use cast to ensure it will be loaded as float64
ds = %sql select \
cast(mval as float) as mval, \
sval, \
nval \
from abc
df = ds.DataFrame()
Check columns: df.dtypes
mval float64
sval object
wcount int64
Note that the column sval
without cast is an object. It needs pd.to_numeric
before it can be used with most stat function.
Upvotes: 0
Reputation: 41
X = np.array(X, dtype=float)
You can use this to convert to array of float in python 3.7.6
Upvotes: 2
Reputation: 393923
You can convert most of the columns by just calling convert_objects
:
In [36]:
df = df.convert_objects(convert_numeric=True)
df.dtypes
Out[36]:
Date object
WD int64
Manpower float64
2nd object
CTR object
2ndU float64
T1 int64
T2 int64
T3 int64
T4 float64
dtype: object
For column '2nd' and 'CTR' we can call the vectorised str
methods to replace the thousands separator and remove the '%' sign and then astype
to convert:
In [39]:
df['2nd'] = df['2nd'].str.replace(',','').astype(int)
df['CTR'] = df['CTR'].str.replace('%','').astype(np.float64)
df.dtypes
Out[39]:
Date object
WD int64
Manpower float64
2nd int32
CTR float64
2ndU float64
T1 int64
T2 int64
T3 int64
T4 object
dtype: object
In [40]:
df.head()
Out[40]:
Date WD Manpower 2nd CTR 2ndU T1 T2 T3 T4
0 2013/4/6 6 NaN 2645 5.27 0.29 407 533 454 368
1 2013/4/7 7 NaN 2118 5.89 0.31 257 659 583 369
2 2013/4/13 6 NaN 2470 5.38 0.29 354 531 473 383
3 2013/4/14 7 NaN 2033 6.77 0.37 396 748 681 458
4 2013/4/20 6 NaN 2690 5.38 0.29 361 528 541 381
Or you can do the string handling operations above without the call to astype
and then call convert_objects
to convert everything in one go.
UPDATE
Since version 0.17.0
convert_objects
is deprecated and there isn't a top-level function to do this so you need to do:
df.apply(lambda col:pd.to_numeric(col, errors='coerce'))
See the docs and this related question: pandas: to_numeric for multiple columns
Upvotes: 49
Reputation: 95
I had this problem in a DataFrame (df
) created from an Excel-sheet with several internal header rows.
After cleaning out the internal header rows from df
, the columns' values were of "non-null object" type (DataFrame.info()
).
This code converted all numerical values of multiple columns to int64 and float64 in one go:
for i in range(0, len(df.columns)):
df.iloc[:,i] = pd.to_numeric(df.iloc[:,i], errors='ignore')
# errors='ignore' lets strings remain as 'non-null objects'
Upvotes: 7
Reputation: 1733
convert_objects is deprecated.
For pandas >= 0.17.0, use pd.to_numeric
df["2nd"] = pd.to_numeric(df["2nd"])
Upvotes: 41
Reputation: 155
Or you can use regular expression to handle multiple items as the general case of this issue,
df['2nd'] = pd.to_numeric(df['2nd'].str.replace(r'[,.%]',''))
df['CTR'] = pd.to_numeric(df['CTR'].str.replace(r'[^\d%]',''))
Upvotes: 0
Reputation: 388
You can try this:
df['2nd'] = pd.to_numeric(df['2nd'].str.replace(',', ''))
df['CTR'] = pd.to_numeric(df['CTR'].str.replace('%', ''))
Upvotes: 1