Reputation: 403
I am reading a xlsx file with pandas and a Column contain 18 digit number for example 360000036011012000
after reading the number is converted to 360000036011011968
my code
import pandas as pd
df = pd.read_excel("Book1.xlsx")
I also tried converting the column to string but the results are same
df = pd.read_excel("Book1.xlsx",dtype = {"column_name":"str" })
also tried with engine = 'openpyxl'
also if the same number is in csv file there is no problem reading works fine but I have to read it from excel only.
Upvotes: 6
Views: 1567
Reputation: 9197
That is an Excel problem, not a pandas problem. See here:
The yellow marked entries, are actually the number below * 10 +1 so should not end on 0.
What happens under the hood in Excel seems to be a number limit of 18. But the last two numbers are interpreted as decimals. Since this is a Excel not a CSV problem, a csv will work just fine.
Solution:
Format the numbers in Excel as Text, as shown in the first picture with: =Text(CELL,0). Pandas can then import it as string, but you will lose the information of the last digits. Therefore Excel should not be used for numbers with more than 18 digits. Use a different file, like csv, insert the numbers directly as strings into excel by using a leading: ' symbol.
Upvotes: 1