TrueLies
TrueLies

Reputation: 1

CSV using '-' as NULL. Error to convert column to INT

I have a CSV

df = pd.read_csv('data.csv')

Table:

Column A Column B Column C
4068744 -1472525 2596219
198366 - -

The file is using '-' for nul values

I tried converting to int without handling that '-'.

My question is: how do I strip the string '-' without changing the negative values?

df['Column B'] = df['Column B'].astype(int)

ValueError: invalid literal for int() with base 10: '-'

Upvotes: 0

Views: 85

Answers (1)

JayPeerachai
JayPeerachai

Reputation: 3842

Higher version of pandas can hold integer dtypes with missing values. Normal int conversion doesn't support null values.

# replace - with null
df.replace('-', pd.NA, inplace=True)
# and use Int surrounding with ''
df['Column B'] = df['Column B'].astype('Int64')

output:

> df

  Column A  Column B Column C
0  4068744  -1472525  2596219
1   198366      <NA>     <NA>

> df['Column B'].info

Name: Column B, dtype: Int64>

Upvotes: 0

Related Questions