Frank Krueger
Frank Krueger

Reputation: 70993

How to output list of floats to a binary file in Python

I have a list of floating-point values in Python:

floats = [3.14, 2.7, 0.0, -1.0, 1.1]

I would like to write these values out to a binary file using IEEE 32-bit encoding. What is the best way to do this in Python? My list actually contains about 200 MB of data, so something "not too slow" would be best.

Since there are 5 values, I just want a 20-byte file as output.

Upvotes: 38

Views: 67625

Answers (8)

Grant
Grant

Reputation: 2908

I'm not sure how NumPy will compare performance-wise for your application, but it may be worth investigating.

Using NumPy:

from numpy import array
a = array(floats,'float32')
output_file = open('file', 'wb')
a.tofile(output_file)
output_file.close()

results in a 20 byte file as well.

Upvotes: 11

louis
louis

Reputation: 61

My "answer" is really a comment on the various answers. I can't comment since I don't have 50 reputation.

If the file is to be read back by Python, then use the "pickle" module. This one tool can read and write many things in binary.

But the way the question is asked, saying "IEEE 32-bit encoding", it sounded like the file will be read back in other languages. In that case the byte order should be specified. The problem is that most machines are x86, with little-endian byte order, but the number one data processing language is Java/JVM, using the big-endian byte order. So the Python "tofile()" will use C, which uses little endian due to the machine being little-endian, and then the data processing code on Java/JVM will decode using big endian, leading to error.

To work with JVM:

# convert to bytes, BIG endian, for use by Java
import struct
f = [3.14, 2.7, 0.0, -1.0, 1.1]
b = struct.pack('>'+'f'*len(f), *f)

with open("f.bin", "wb") as file:
    file.write(b)

On the Java side:

try(var stream = new DataInputStream(new FileInputStream("f.bin")))
{
    for(int i = 0; i < 5; i++)
        System.out.println(stream.readFloat());
}
catch(Exception ex) {}

The problem now is the Python 'f'*len(f) code - hopefully the interpreter doesn't actually create a super long "ffffff..." string.

I would use numpy array and byteswap

import numpy, sys
f = numpy.array([3.14, 2.7, 0.0, -1.0, 1.1], dtype=numpy.float32)

if sys.byteorder == "little":
    f.byteswap().tofile("f.bin") # using BIG endian, for use by Java
else:
    f.tofile("f.bin")

Upvotes: 6

Nadia Alramli
Nadia Alramli

Reputation: 114963

Alex is absolutely right, it's more efficient to do it this way:

from array import array
output_file = open('file', 'wb')
float_array = array('d', [3.14, 2.7, 0.0, -1.0, 1.1])
float_array.tofile(output_file)
output_file.close()

And then read the array like that:

input_file = open('file', 'rb')
float_array = array('d')
float_array.fromstring(input_file.read())

array.array objects also have a .fromfile method which can be used for reading the file, if you know the count of items in advance (e.g. from the file size, or some other mechanism)

Upvotes: 50

Alex Martelli
Alex Martelli

Reputation: 881913

The array module in the standard library may be more suitable for this task than the struct module which everybody is suggesting. Performance with 200 MB of data should be substantially better with array.

If you'd like to take at a variety of options, try profiling on your system with something like this

Upvotes: 14

dino
dino

Reputation: 3307

I ran into a similar issue while inadvertently writing a 100+ GB csv file. The answers here were extremely helpful but, to get to the bottom of it, I profiled all of the solutions mentioned and then some. All profiling runs were done on a 2014 Macbook Pro with a SSD using python 2.7. From what I'm seeing, the struct approach is definitely the fastest from a performance point of view:

6.465 seconds print_approach    print list of floats
4.621 seconds csv_approach      write csv file
4.819 seconds csvgz_approach    compress csv output using gzip
0.374 seconds array_approach    array.array.tofile
0.238 seconds numpy_approach    numpy.array.tofile
0.178 seconds struct_approach   struct.pack method

Upvotes: 6

joshdick
joshdick

Reputation: 1071

struct.pack() looks like what you need.

http://docs.python.org/library/struct.html

Upvotes: 3

thesamet
thesamet

Reputation: 6582

See: Python's struct module

import struct
s = struct.pack('f'*len(floats), *floats)
f = open('file','wb')
f.write(s)
f.close()

Upvotes: 25

SilentGhost
SilentGhost

Reputation: 319701

have a look at struct.pack_into

Upvotes: 3

Related Questions