Reputation: 7489
I have 10,000 binary files, named like this:
file0.bin
file1.bin
............ ............
file10000.bin
Each of the above files contains exactly 391 float values (1564 bytes per file).
my goal is to read all of the files into a python array in the fastest way possible. If I open & close each file using a script, it takes a lot of time (about 8min!). are there any other creative ways to read these files FAST?
I am using Ubuntu Linux and would prefer a solution that can work with Python. Thanks.
Upvotes: 3
Views: 709
Reputation: 96071
You have 10001 files (0 to 10000 inclusive) and it takes 8 minutes to run the following?
try: xrange # python 2 - 3 compatibility
except NameError: xrange= range
import array
final= array.array('f')
for file_seq in xrange(10001):
with open("file%d.bin" % file_seq, "rb") as fp:
final.fromfile(fp, 391)
What's the underlying filesystem? How much RAM do you have? What's your processor and its speed?
Upvotes: 0
Reputation: 49095
If you want even faster make ramdisk:
# mkfs -q /dev/ram1 $(( 2 * 10000)) ## roughly the size you need
# mkdir -p /ramcache
# mount /dev/ram1 /ramcache
# df -H | grep ramcache
now concat
# cat file{1..10000}.bin >> /ramcache/concat.bin ## thanks SiegeX
Then let your script on that file
Since I haven't tested I prefixed everything with '#' so that you wouldn't have any accidents. Just remove them if you want it to work.
This is an option but I would urge you to consider looking at the comments people have posted directly under your Q You could probably get better results examining what you are doing wrong as I could not reproduce your speed problem of 8 mins.
Upvotes: 2
Reputation: 34708
Iterate over them and use optimise flag you might also want to parse them using pypy it compiles python via a JIT compiler allowing for a somewhat marked increase in speed.
Upvotes: 0