Reputation: 91

Pandas Dataframes.to_csv truncates long values

Problem: I'm trying to store big datasets using Pandas dataframes in python. My trouble is that when I try to save it to csv, chunks of my data is being trunctated, as such:

e+12

and

[value1 value2 value3 . . . value1853 value1854]

Explanation: I need to store lots of data into single cells, and some of the values I need to store are Long (time) values and I created a short script to display the errors I'm getting:

dframe = pd.DataFrame()
arr = np.array([])
for x in range(1234567891230,1234567892230):
    arr = np.append(arr,x)
dframe['elements'] = [arr]
print(dframe['elements'][0][999])   # prints correct values, eg. 1234567892229.0
dframe.to_csv('temp.csv', index=False)

In the example above stored values appears as below for the first 1000 values (1234567891230 to 1234567892230)

1.23456789e+12

Which completely ignores the four least significant characters. If you extend the list to 1001 values even more gets truncated:

dframe = pd.DataFrame()
arr = np.array([])
for x in range(1234567891230,1234567892231):
    arr = np.append(arr,x)
dframe['elements'] = [arr]
print(dframe['elements'][0][999])   # still prints correct values, eg. 1234567892229.0
dframe.to_csv('temp.csv', index=False)

And the full csv file finally looks like this:

elements

"[1.23456789e+12 1.23456789e+12 1.23456789e+12 ... 1.23456789e+12 1.23456789e+12 1.23456789e+12]"

Which has removed almost all of the 1000 elements and replaced them by ... .

Does anyone know any workaround for these problems or how to solve them?

This is not a problem of truncation simply for display (such as Pandas to_html() truncates string contents) but actually corrupts the data stored to csv.

Upvotes: 5

Answers (3)

Jens Martinsson

Reputation: 91

Changing the data type as @Jacob Tomlinson said solves one problem, looking into numpys array2string solved the other.

Adding np.set_printoptions(threshold=np.nan) stops to_csv from truncating the output strings.

dframe = pd.DataFrame()
arr = np.array([])
for x in range(1234567891230,1234567892230):
    arr = np.append(arr,x)
dframe['elements'] = [arr.astype('uint64')]
print(dframe['elements'][0][999])   # prints correct values, eg. 1234567892229.0

np.set_printoptions(threshold=np.nan)
dframe.to_csv('temp.csv', index=False)

Upvotes: 4

dozyaustin

Reputation: 661

So , replicating your code on my machine, I see the rounding, but not the truncation of the list.

I do not know the best solution but here are some suggestions

Do you need the file on drive to he human readable? Do what system will read it later?

if the file will just go into another python step, consider using pickle instead
consider turning your list into a string, you have full control over the string (e.g. number of explicit decimal places for each value). If you keep the list structure intact internally, but just wrap it in "" you can easily unpack it with just about any tool out there

Upvotes: 0

Jacob Tomlinson

Reputation: 3793

Try setting the dtype of your numpy array to an integer.

dframe = pd.DataFrame()
arr = np.array([], dtype='int16')
for x in range(1234567891230,1234567892230):
    arr = np.append(arr,x)
dframe['elements'] = [arr]
print(dframe['elements'][0][999])   # prints correct values, eg. 1234567892229.0
dframe.to_csv('temp.csv', index=False)

Elements

"[1234567891230 1234567891231 1234567891232 ... 1234567891233 1234567891234]"

Upvotes: 1

Pandas Dataframes.to_csv truncates long values

Answers (3)

Related Questions