Reputation: 409
Here are the top 5 rows of the DataFrame (poorly formatted but you can see that most of these values are convertable to numbers)
df.head()
ID Overall Acceleration Aggression Agility Balance Ball control Composure Crossing Curve Dribbling Finishing Free kick accuracy GK diving GK handling GK kicking GK positioning GK reflexes Heading accuracy Interceptions Jumping Long passing Long shots Marking Penalties Positioning Reactions Short passing Shot power Sliding tackle Sprint speed Stamina Standing tackle Strength Vision Volleys
0 20801 94 89 63 89 63 93 95 85 81 91 94 76 7 11 15 14 11 88 29 95 77 92 22 85 95 96 83 94 23 91 92 31 80 85 88
1 158023 93 92 48 90 95 95 96 77 89 97 95 90 6 11 15 14 8 71 22 68 87 88 13 74 93 95 88 85 26 87 73 28 59 90 85
2 190871 92 94 56 96 82 95 92 75 81 96 89 84 9 9 15 15 11 62 36 61 75 77 21 81 90 88 81 80 33 90 78 24 53 80 83
3 176580 92 88 78 86 60 91 83 77 86 86 94 84 27 25 31 33 37 77 41 69 64 86 30 85 92 93 83 87 38 77 89 45 80 84 88
4 167495 92 58 29 52 35 48 70 15 14 30 13 11 91 90 95 91 89 25 30 78 59 16 10 47 12 85 55 25 11 61 44 10 83 70 11
Here is a description of all of the types:
df.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 18085 entries, 0 to 18084
Data columns (total 36 columns):
ID 18085 non-null int64
Overall 18085 non-null int64
Acceleration 18085 non-null object
Aggression 18085 non-null object
Agility 18085 non-null object
Balance 18085 non-null object
Ball control 18085 non-null object
Composure 18085 non-null object
Crossing 18085 non-null object
Curve 18085 non-null object
Dribbling 18085 non-null object
Finishing 18085 non-null object
Free kick accuracy 18085 non-null object
...
dtypes: int64(2), object(34)
memory usage: 5.1+ MB
Here is my attempt to convert the object types to floats.
for column in full:
tmp = pd.Series(column)
column = tmp.astype("float64", errors="ignore")
And afterwards all of the relevant types are still "object."
df.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 18085 entries, 0 to 18084
Data columns (total 36 columns):
ID 18085 non-null int64
Overall 18085 non-null int64
Acceleration 18085 non-null object
Aggression 18085 non-null object
Agility 18085 non-null object
Balance 18085 non-null object
Ball control 18085 non-null object
Composure 18085 non-null object
Crossing 18085 non-null object
Curve 18085 non-null object
Dribbling 18085 non-null object
Finishing 18085 non-null object
Free kick accuracy 18085 non-null object
...
dtypes: int64(2), object(34)
memory usage: 5.1+ MB
Can anybody see what I'm doing wrong? I've tried many different approaches from this site and others but I can't understand why the types aren't being changed. Any help is appreciated. Thank you.
Edit: I am doing this in a Kaggle.com IPython notebook if that could have something to do with this.
Upvotes: 4
Views: 17974
Reputation: 2337
The object
column types are likely due to empty values in the columns somewhere. If you are working with a large data table and want to automatically
handle everything as string (including column filtering) you need to fill the empty cells with something. I use the following two steps:
df = df.astype("string") # convert all columns to string
df = df.fillna("NULL") # fill any empty cells with a "NULL" string
or
df = df.astype("string").fillna("NULL") # shortened version
Filling the empty cells helps prevent any dataframe queries, such as:
df = df[df['ReturnURL'].str.contains('AppSource=production')]
from returning unexpected results.
Upvotes: 0
Reputation: 2133
Migrating solution from comments to answers. Thanks to @Wen.
df=df.apply(pd.to_numeric, errors='coerce')
Upvotes: 6