Reputation: 405
This is the error that is showing up whenever i try to convert the dataframe to int.
("invalid literal for int() with base 10: '260,327,021'", 'occurred at index Population1'
Everything in the df is a number. I assume the error is due to the extra quote at the end but how do i fix it?
Upvotes: 23
Views: 85177
Reputation: 1058
For me, it was a bit different case.
I loaded my dataframe as such:
my_converter = {'filename': str, 'revision_id': int}
df = pd.read_csv("my.csv", header=0, sep="\t", converters=my_converter)
becuase head -n 3 my.csv
looked like so:
"filename" "revision_id"
"some_filename.pdf" "224"
"another_filename.pdf" "128"
However, down thousands of lines, there was an entry like this:
"very_\"special\"_filename.pdf" "46"
which meant that I had to specify the escape character to the read_csv()
. Else, it would try to cast special
as int
for the revision_id
field and generate the error.
So the correct way is to:
df = pd.read_csv("my.csv", header=0, sep="\t", escapechar='\\', converters=my_converter)
Upvotes: 2
Reputation: 5123
I solved the error using pandas.to_numeric
In your case,
data.Population1 = pd.to_numeric(data.Population1, errors="coerce")
'data' is the parent Object.
After that, you can convert float to int as well
data.Population1.astype(int)
Upvotes: 7
Reputation: 760
Others might encounter the following issue, when the string is a float:
>>> int("34.54545")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: invalid literal for int() with base 10: '34.54545'
The workaround for this is to convert to a float first and then to an int:
>>> int(float("34.54545"))
34
Or pandas specific:
df.astype(float).astype(int)
Upvotes: 16
Reputation: 294506
I run this
int('260,327,021')
and get this
--------------------------------------------------------------------------- ValueError Traceback (most recent call last) <ipython-input-448-a3ba7c4bd4fe> in <module>() ----> 1 int('260,327,021') ValueError: invalid literal for int() with base 10: '260,327,021'
I assure you that not everything in your dataframe is a number. It may look like a number, but it is a string with commas in it.
You'll want to replace your commas and then turn to an int
pd.Series(['260,327,021']).str.replace(',', '').astype(int)
0 260327021
dtype: int64
Upvotes: 20