Reputation: 1
I have a CSV
df = pd.read_csv('data.csv')
Table:
Column A | Column B | Column C |
---|---|---|
4068744 | -1472525 | 2596219 |
198366 | - | - |
The file is using '-' for nul values
I tried converting to int without handling that '-'.
My question is: how do I strip the string '-' without changing the negative values?
df['Column B'] = df['Column B'].astype(int)
ValueError: invalid literal for int() with base 10: '-'
Upvotes: 0
Views: 85
Reputation: 3842
Higher version of pandas
can hold integer
dtypes with missing values. Normal int
conversion doesn't support null values.
# replace - with null
df.replace('-', pd.NA, inplace=True)
# and use Int surrounding with ''
df['Column B'] = df['Column B'].astype('Int64')
output:
> df
Column A Column B Column C
0 4068744 -1472525 2596219
1 198366 <NA> <NA>
> df['Column B'].info
Name: Column B, dtype: Int64>
Upvotes: 0