Reputation: 531
So I have this code to check for filetype for each file in a directory. Just need to read first 4 bytes and check against pattern.
The code looks a little bit convoluted and really slow, but I can't figure out a faster way to do it in Nim.
What am I doing wrong?
import os
var
buf {.noinit.}: array[4, char]
let out_pat = ['{', '\\', 'r', 't']
var
flag = true
num_read = 0
var dirname = "/some/path/*"
for path in walkFiles(dirname):
num_read = open(path).readChars(buf, 0, 4)
for i in 0..num_read-1:
if buf[i] != out_pat[i]:
flag = false
if flag:
echo path
flag = true
for comparison, Python code that is 2x faster:
def find_rtf(dir_):
for path in glob.glob(dir_):
with open(path,'rb') as f:
if f.read(4) == b'{\\rt':
print(path)
find_rtf("/some/path/*")
and regular cli which is about 10x faster than Python but has some pipe bug when encountering 10^6+ files
time find ./ -type f -print0 | LC_ALL=C xargs -0 -P 6 -n 100 head -c 5 -v| grep "{\\\rt" -B 1
Upvotes: 3
Views: 555
Reputation: 5403
On my system (Linux) the Nim version is twice as fast as the Python one. But maybe my files are just wrong. What operating system are you on?
You should close files and your comparison is wrong if the file is shorter than 4 bytes. Here's a minor cleanup:
import os
const
out_pat = ['{', '\\', 'r', 't']
dirname = "/some/path/*"
for path in walkFiles(dirname):
var buf: array[4, char]
let file = open(path)
defer: close(file) # Always close file when it goes out of scope
discard file.readChars(buf, 0, 4)
if buf == out_pat:
echo path
Make sure you compile with nim -d:release c foobar.nim
.
The command line version is much faster as you use 6 processes at the same time. With -P 1
instead of -P 6
it is exactly as fast as the Nim version for me.
Upvotes: 4