Reputation: 6571
What's the easiest efficient way to read from stdin and output every nth byte? I'd like a command-line utility that works on OS X, and would prefer to avoid compiled languages.
This Python script is fairly slow (25s for a 3GB file when n=100000000):
#!/usr/bin/env python
import sys
n = int(sys.argv[1])
while True:
chunk = sys.stdin.read(n)
if not chunk:
break
sys.stdout.write(chunk[0])
Unfortunately we can't use sys.stdin.seek
to avoid reading the entire file.
Edit: I'd like to optimize for the case when n is a significant fraction of the file size. For example, I often use this utility to sample 500 bytes at equally-spaced locations from a large file.
Upvotes: 2
Views: 1215
Reputation: 845
NOTE: OP change the example n from 100 to 100000000 which effectively render my code slower than his, normally i would just delete my answer since it is no longer better than the original example, but my answer gotten a vote so i will just leave it as it is.
the only way that i can think of to make it faster is to read everything at once and use slice
#!/usr/bin/env python
import sys
n = int(sys.argv[1])
data = sys.stdin.read()
print(data[::n])
although, trying to fit a 3GB file into the ram might be a very bad idea
Upvotes: 2