Hackerds
Hackerds

Reputation: 1205

Pandas: Removing invalid literals in a column while converting object to int

I am trying to convert a column with postal codes of 'object' type to 'int'

df['ZIP'] = df['ZIP'].astype(str).astype(int)

My data is more than 100000 records, and it keeps throwing message with different literals that are invalid in that column. I understand the type of data does not match, and the conversion fails.

ValueError: invalid literal for int() with base 10: '     '

To correct the above error, I replaced the 'empty rows' with nan and dropped them using below code:

   df['ZIP'] = df['ZIP'].replace('', np.nan)
   df['ZIP'] = df.dropna(subset=['ZIP'])

After which I get the below error again.

  ValueError: invalid literal for int() with base 10: 'SAM'

Is there a efficient way to remove all invalid literals without all these steps?

Upvotes: 4

Views: 4031

Answers (1)

user2285236
user2285236

Reputation:

First, convert to numeric with parameter errors='coerce' so that the ones cannot be converted will be NaN. Then, drop them and cast the Series as integer.

df['ZIP'] = pd.to_numeric(df['ZIP'], errors='coerce')
df = df.dropna(subset=['ZIP'])
df['ZIP'] = df['ZIP'].astype('int')

Upvotes: 11

Related Questions