tba
tba

Reputation: 6571

Output every nth byte of stdin

What's the easiest efficient way to read from stdin and output every nth byte? I'd like a command-line utility that works on OS X, and would prefer to avoid compiled languages.

This Python script is fairly slow (25s for a 3GB file when n=100000000):

#!/usr/bin/env python
import sys
n = int(sys.argv[1])
while True:
    chunk = sys.stdin.read(n)
    if not chunk:
        break
    sys.stdout.write(chunk[0])

Unfortunately we can't use sys.stdin.seek to avoid reading the entire file.

Edit: I'd like to optimize for the case when n is a significant fraction of the file size. For example, I often use this utility to sample 500 bytes at equally-spaced locations from a large file.

Upvotes: 2

Views: 1215

Answers (1)

freeforall tousez
freeforall tousez

Reputation: 845

NOTE: OP change the example n from 100 to 100000000 which effectively render my code slower than his, normally i would just delete my answer since it is no longer better than the original example, but my answer gotten a vote so i will just leave it as it is.


the only way that i can think of to make it faster is to read everything at once and use slice

#!/usr/bin/env python
import sys
n = int(sys.argv[1])
data = sys.stdin.read()
print(data[::n])

although, trying to fit a 3GB file into the ram might be a very bad idea

Upvotes: 2

Related Questions