Reputation: 21343
I have a large sorted file with one number per line. I want to output the number of numbers in each range of some size. So for example taking the range to be 10 and the input as
1
4
34
37
42
48
53
The output should be [2, 1, 3, 2, 2, 2, 1]
. This is the length of [1,4], [4], [34,37,42], [37,42], [42,48], [48,53], [53]
. I think this can be solved using deque
but, mostly as a programming exercise and also to use less memory, I am trying to solve by opening the file twice and have two different pointers into the file. One to read in the next left hand end of a list and one to read in the new right hand ends. So my code has
fin1 = open(file, 'r')
fin2 = open(file, 'r')
I think loop over fin1 and when the range gets too big, I read in more of fin2 until the range is small enough and continue stepping down fin1 and fin2.
However I can't get this to work. It seems to not like me to have two file handles open for the same file. How can one do this?
Upvotes: 1
Views: 475
Reputation: 117520
EDIT: my previous implementation is deque solution (and not perfect one). Here's solution with 2 file pointers:
def sliding_ranges(filename=r"C:\Temp\data.txt", N=10):
f1, f2 = open(filename), open(filename)
k, res, i1, i2, r1, r2 = 1, [], 0, 0, 1, 1
while True:
while r2 and (not i2 or i2 - i1 < N):
r2, k = f2.readline(), k + 1
if r2: i2 = int(r2)
while r1 and (not i1 or not r2 or i2 - i1 > N):
r1, k = f1.readline(), k - 1
if i1: res.append(k)
if r1: i1 = int(r1)
if not r1: break
return res
>>> sliding_ranges(r"C:\Temp\data.txt", 10)
[2, 1, 3, 2, 2, 2, 1]
PREVIOUS: here's implementation with one pass. While you're traversing, you keep counting length of lists
f = open(r"d:\temp\data.txt")
d, res, N = [], {}, 10
for j in f:
i = int(j)
if i not in res: res[i] = 1
for k, v in res.items():
if i - k > N:
d.append(v)
del res[k]
elif k != i:
res[k] += 1
d = d + [v for v in res.values()]
here's list of objects in each iteration
d []
res {}
d []
res {1: 1}
d []
res {1: 2, 4: 1}
d [2, 1]
res {34: 1}
d [2, 1]
res {34: 2, 37: 1}
d [2, 1]
res {34: 3, 42: 1, 37: 2}
d [2, 1, 3, 2]
res {42: 2, 48: 1}
d = [2, 1, 3, 2, 2, 2, 1]
Upvotes: 1
Reputation: 2311
Here's an implementation, there might be a better way to do it but this should work. I'm assuming the same input you posted in your question.
def ranges(n):
f = open("tmp.txt")
while True:
i = f.tell()
try:
curr = int(f.readline().rstrip())
except ValueError:
break # EOF
j = f.tell()
while True:
k = f.tell() # End of range location
try:
next = int(f.readline().rstrip())
except ValueError:
break # EOF
if next < n or (next - curr) < n:
continue
else:
break
f.seek(i) # Go to beginning of range
r = []
while f.tell() < k:
r.append(int(f.readline().strip()))
print(r)
f.seek(j) # Go to line after beginning of range
>>> ranges(10)
[1, 4]
[4]
[34, 37, 42]
[42, 48]
[48, 53]
[53]
Upvotes: 1
Reputation: 208585
Here is a solution that uses itertools.tee()
to simulate reading from handles, but only actually opening one:
from itertools import tee
def sliding_range(file, size):
fin1, fin2 = tee(int(ln.strip()) for ln in open(file) if ln.strip())
n = 1
next(fin2)
val2 = next(fin2)
for val1 in fin1:
while val2 is not None and val2 <= val1 + size:
n += 1
try:
val2 = next(fin2)
except StopIteration:
val2 = None
break
yield n
n -= 1
Example (with your example data copied to 'test.txt'):
>>> list(sliding_range('test.txt', 10))
[2, 1, 3, 2, 2, 2, 1]
Upvotes: 3
Reputation: 2329
I'm not sure why you're doing it this way, but to answer your question (which is about file I/O not about counting values) you need one file handle and two file pointers.
Once you open the file with file handle f
, f.tell()
will tell you what your position is in the file, and f.seek(pos)
will move the pointer back to a given position.
f.seek(pos,how)
takes an optional second parameter that gives you some flexibility in how the seek is calculated (setting how
to 0 seeks from the beginning of the file, 1 from the current position, 2 from the end). This lets you use pos
as an offset from the reference, rather than strictly from the beginning.
Upvotes: 0