Simd
Simd

Reputation: 21343

Counting the number of points in a sliding range

I have a large sorted file with one number per line. I want to output the number of numbers in each range of some size. So for example taking the range to be 10 and the input as

1
4
34
37
42
48
53

The output should be [2, 1, 3, 2, 2, 2, 1]. This is the length of [1,4], [4], [34,37,42], [37,42], [42,48], [48,53], [53]. I think this can be solved using deque but, mostly as a programming exercise and also to use less memory, I am trying to solve by opening the file twice and have two different pointers into the file. One to read in the next left hand end of a list and one to read in the new right hand ends. So my code has

fin1 = open(file, 'r')
fin2 = open(file, 'r')

I think loop over fin1 and when the range gets too big, I read in more of fin2 until the range is small enough and continue stepping down fin1 and fin2.

However I can't get this to work. It seems to not like me to have two file handles open for the same file. How can one do this?

Upvotes: 1

Views: 475

Answers (4)

roman
roman

Reputation: 117520

EDIT: my previous implementation is deque solution (and not perfect one). Here's solution with 2 file pointers:

def sliding_ranges(filename=r"C:\Temp\data.txt", N=10):
    f1, f2 = open(filename), open(filename)
    k, res, i1, i2, r1, r2 = 1, [], 0, 0, 1, 1
    while True:
        while r2 and (not i2 or i2 - i1 < N):
            r2, k = f2.readline(), k + 1
            if r2: i2 = int(r2)

        while r1 and (not i1 or not r2 or i2 - i1 > N):
            r1, k = f1.readline(), k - 1
            if i1: res.append(k)
            if r1: i1 = int(r1)

        if not r1: break

    return res

    >>> sliding_ranges(r"C:\Temp\data.txt", 10)
    [2, 1, 3, 2, 2, 2, 1]

PREVIOUS: here's implementation with one pass. While you're traversing, you keep counting length of lists

f = open(r"d:\temp\data.txt")

d, res, N = [], {}, 10
for j in f:
    i = int(j)
    if i not in res: res[i] = 1
    for k, v in res.items():
        if i - k > N:
            d.append(v)
            del res[k]
        elif k != i:
            res[k] += 1 
d = d + [v for v in res.values()]

here's list of objects in each iteration

d []
res {}

d []
res {1: 1}

d []
res {1: 2, 4: 1}

d [2, 1]
res {34: 1}

d [2, 1]
res {34: 2, 37: 1}

d [2, 1]
res {34: 3, 42: 1, 37: 2}

d [2, 1, 3, 2]
res {42: 2, 48: 1}

d = [2, 1, 3, 2, 2, 2, 1]

Upvotes: 1

FastTurtle
FastTurtle

Reputation: 2311

Here's an implementation, there might be a better way to do it but this should work. I'm assuming the same input you posted in your question.

def ranges(n):
    f = open("tmp.txt")

    while True:
        i = f.tell()
        try:
            curr = int(f.readline().rstrip())
        except ValueError:
            break  # EOF

        j = f.tell()

        while True:
            k = f.tell()  # End of range location
            try:
                next = int(f.readline().rstrip())
            except ValueError:
                break  # EOF

            if next < n or (next - curr) < n:
                continue
            else:
                break

        f.seek(i)  # Go to beginning of range

        r = []
        while f.tell() < k:
            r.append(int(f.readline().strip()))
        print(r)

        f.seek(j)  # Go to line after beginning of range


>>> ranges(10)
[1, 4]
[4]
[34, 37, 42]
[42, 48]
[48, 53]
[53]

Upvotes: 1

Andrew Clark
Andrew Clark

Reputation: 208585

Here is a solution that uses itertools.tee() to simulate reading from handles, but only actually opening one:

from itertools import tee

def sliding_range(file, size):
    fin1, fin2 = tee(int(ln.strip()) for ln in open(file) if ln.strip())
    n = 1
    next(fin2)
    val2 = next(fin2)
    for val1 in fin1:
        while val2 is not None and val2 <= val1 + size:
            n += 1
            try:
                val2 = next(fin2)
            except StopIteration:
                val2 = None
                break
        yield n
        n -= 1

Example (with your example data copied to 'test.txt'):

>>> list(sliding_range('test.txt', 10))
[2, 1, 3, 2, 2, 2, 1]

Upvotes: 3

Omegaman
Omegaman

Reputation: 2329

I'm not sure why you're doing it this way, but to answer your question (which is about file I/O not about counting values) you need one file handle and two file pointers.

Once you open the file with file handle f, f.tell() will tell you what your position is in the file, and f.seek(pos) will move the pointer back to a given position.

f.seek(pos,how) takes an optional second parameter that gives you some flexibility in how the seek is calculated (setting how to 0 seeks from the beginning of the file, 1 from the current position, 2 from the end). This lets you use pos as an offset from the reference, rather than strictly from the beginning.

Upvotes: 0

Related Questions