Reputation: 867
I have a JSON file and would like to write a function to return a list of the next 10 objects in the file. I've started with a class, FileProcessor
, and the method get_row()
which returns a generator that yields a single JSON object from the file. Another method, get_chunk()
, should return the next 10 objects.
Here is what I have so far:
class FileProcessor(object):
def __init__(self, filename):
self.FILENAME = filename
def get_row(self):
with open( os.path.join('path/to/file', self.FILENAME), 'r') as f:
for i in f:
yield json.loads(i)
def get_chunk(self):
pass
I've tried like this, but it only returns the first 10 rows every time.
def get_chunk(self):
chunk = []
consumer = self.consume()
for i in self.get_row():
chunk.append(i)
return chunk
So what is the correct way to write get_chunk()
?
Upvotes: 1
Views: 762
Reputation: 77347
(note: PM 2Ring beat me to it!)
Your get_row
method doesn't return a row - it returns a generator that will produce rows as you iterate through it. You can see that in the get_chunk
method that does for i in self.get_row...
. The annoying thing is that every time you call get_row
it will open the file again and return the first object. The problem with get_chunk
is that you don't pass in the number of rows you want and you don't limit the for
loop to that number. get_chunk
gets all of the rows in the file.
How about a rethink? All you really need is a generator that reads lines and deserializes the json. The map
function is already built to do that. You can get a single row with python's next
function and multiple rows with itertools.islice
. Your class is just a thin wrapper around stuff that's already implemented so just use the native tools and skip writing your own class completely.
Fist I generate a test file
>>> with open('test.json', 'w') as fp:
... for obj in 'foo', 'bar', 'baz':
... fp.write(json.dumps(obj) + '\n')
...
Now I can create an iterator that can be used to get a row or list of rows. In cpython, you can open the file in the map
function safely, but you can also do your work in a with
clause.
>>> json_iter = map(json.loads, open('test.json'))
>>> next(json_iter)
'foo'
>>>
>>> with open('test.json') as fp:
>>> json_iter = map(json.loads, open('test.json'))
>>> next(json_iter)
'foo'
I can get all of the objects in a loop
>>> for obj in map(json.loads, open('test.json')):
... print(obj)
...
foo
bar
baz
Or put some of them in a list
>>> list(itertools.islice(json_iter, 2))
['foo', 'bar']
or combine operations
>>> json_iter = map(json.loads, open('test.json'))
>>> for obj in json_iter:
... if obj == 'foo':
... list(itertools.islice(json_iter,2))
...
['bar', 'baz']
The point is, the simple map
based iterator can do what you want, without having to update a wrapper class every time you think of a new use case.
Upvotes: 2
Reputation: 55479
Here's a simple generator that gets values from another generator and puts them into a list. It should work with your FileProcessor.get_row
method.
def count(n):
for v in range(n):
yield str(v)
def chunks(it, n):
while True:
yield [next(it) for _ in range(n)]
for u in chunks(count(100), 12):
print(u)
output
['0', '1', '2', '3', '4', '5', '6', '7', '8', '9', '10', '11']
['12', '13', '14', '15', '16', '17', '18', '19', '20', '21', '22', '23']
['24', '25', '26', '27', '28', '29', '30', '31', '32', '33', '34', '35']
['36', '37', '38', '39', '40', '41', '42', '43', '44', '45', '46', '47']
['48', '49', '50', '51', '52', '53', '54', '55', '56', '57', '58', '59']
['60', '61', '62', '63', '64', '65', '66', '67', '68', '69', '70', '71']
['72', '73', '74', '75', '76', '77', '78', '79', '80', '81', '82', '83']
['84', '85', '86', '87', '88', '89', '90', '91', '92', '93', '94', '95']
Note that this only yields complete chunks. If that's a problem, you can do this:
def chunks(it, n):
while True:
chunk = []
for _ in range(n):
try:
chunk.append(next(it))
except StopIteration:
yield chunk
return
yield chunk
which will print
['96', '97', '98', '99']
after the previous output.
A better way to do this is to use itertools.islice
, which will handle a partial final chunk:
from itertools import islice
def chunks(it, n):
while True:
a = list(islice(it, n))
if not a:
return
yield a
Thanks to Antti Haapala for reminding me about islice
. :)
Upvotes: 2