Reputation: 9394
I am having the following data after I use df.info method on my loaded excel file
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 30000 entries, 1 to 30000
Data columns (total 25 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Unnamed: 0 30000 non-null object
1 X1 30000 non-null object
2 X2 30000 non-null object
3 X3 29669 non-null object
4 X4 29677 non-null object
5 X5 30000 non-null object
6 X6 30000 non-null object
7 X7 30000 non-null object
8 X8 30000 non-null object
9 X9 30000 non-null object
10 X10 30000 non-null object
11 X11 30000 non-null object
12 X12 30000 non-null object
13 X13 30000 non-null object
14 X14 30000 non-null object
15 X15 30000 non-null object
16 X16 30000 non-null object
17 X17 30000 non-null object
18 X18 30000 non-null object
19 X19 30000 non-null object
20 X20 30000 non-null object
21 X21 30000 non-null object
22 X22 30000 non-null object
23 X23 30000 non-null object
24 Y 30000 non-null object
dtypes: object(25)
memory usage: 2.9+ MB
I do not know why all data typ are object although most of them have numerical values how to fix the datatype of my dataset
Upvotes: 3
Views: 6270
Reputation: 323226
Let us try to_numeric
df = pd.DataFrame({'1':['1','2'],'2':['a','b']})
df = df.apply(pd.to_numeric,errors='ignore')
Check
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2 entries, 0 to 1
Data columns (total 2 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 1 2 non-null int64
1 2 2 non-null object
dtypes: int64(1), object(1)
memory usage: 88.0+ bytes
Upvotes: 3
Reputation: 322
Try for example:
df['X1'] = df['X1'].astype(str).astype(int)
If you want to format all columns try:
df = df.astype(int)
This is because, when you import a .csv
file, most of the columns are transformed into objects.
Upvotes: 0