Reputation: 2539
I'm trying to read a few thousands html files stored on disk.
Is there any way to do better than;
for files in os.listdir('.'):
if files.endswith('.html') :
with (open) files as f:
a=f.read()
#do more stuffs
Upvotes: 4
Views: 6056
Reputation: 15434
Here's some code that's significantly faster than with open(...) as f: f.read()
def read_file_bytes(path: str, size=-1) -> bytes:
fd = os.open(path, os.O_RDONLY)
try:
if size == -1:
size = os.fstat(fd).st_size
return os.read(fd, size)
finally:
os.close(fd)
If you know the maximum size of the file, pass that in to the size argument so you can avoid the stat
call.
Here's some all-around faster code:
for entry in os.scandir('.'):
if entry.name.endswith('.html'):
# on windows entry.stat(follow_symlinks=False) is free, but on unix requires a syscall.
a = read_file_bytes(entry.path, entry.stat(follow_symlinks=False).st_size)
a = file_bytes.decode() # if string needed rather than bytes
Upvotes: 0
Reputation: 1147
For a similar problem I have used this simple piece of code:
import glob
for file in glob.iglob("*.html"):
with open(file) as f:
a = f.read()
iglob doesn't stores all file simultaneously, this is perfect with a huge directory.
Remenber to close files after you have finished, the construct "with-open" make sure for you.
Upvotes: 2