Reputation: 1205
I am trying to convert a column with postal codes of 'object' type to 'int'
df['ZIP'] = df['ZIP'].astype(str).astype(int)
My data is more than 100000 records, and it keeps throwing message with different literals that are invalid in that column. I understand the type of data does not match, and the conversion fails.
ValueError: invalid literal for int() with base 10: ' '
To correct the above error, I replaced the 'empty rows' with nan and dropped them using below code:
df['ZIP'] = df['ZIP'].replace('', np.nan)
df['ZIP'] = df.dropna(subset=['ZIP'])
After which I get the below error again.
ValueError: invalid literal for int() with base 10: 'SAM'
Is there a efficient way to remove all invalid literals without all these steps?
Upvotes: 4
Views: 4031
Reputation:
First, convert to numeric with parameter errors='coerce'
so that the ones cannot be converted will be NaN. Then, drop them and cast the Series as integer.
df['ZIP'] = pd.to_numeric(df['ZIP'], errors='coerce')
df = df.dropna(subset=['ZIP'])
df['ZIP'] = df['ZIP'].astype('int')
Upvotes: 11