tamasgal
tamasgal

Reputation: 26309

Limiting the number of rows returned by `.where(...)` in pytables

I am dealing with tables having having up to a few billion rows and I do a lot of "where(numexpr_condition)" lookups using pytables.

We managed to optimise the HDF5 format so a simple where-query over 600mio rows is done under 20s (we still struggling to find out how to make this faster, but that's another story).

However, since it is still too slow for playing around, I need a way to limit the number of results in a query like this simple example one (the foo column is of course indexed):

[row['bar'] for row in table.where('(foo == 234)')]

So this would return lets say 100mio entries and it takes 18s, which is way to slow for prototyping and playing around.

How would you limit the result to lets say 10000?

The database like equivalent query would be roughly:

SELECT bar FROM row WHERE foo==234 LIMIT 10000

Using the stop= attribute is not the way, since it simply takes the first n rows and applies the condition to them. So in worst case if the condition is not fulfilled, I get an empty array:

[row['bar'] for row in table.where('(foo == 234)', stop=10000)]

Using slice on the list comprehension is also not the right way, since it will first create the whole array and then apply the slice, which of course is no speed gain at all:

[row['bar'] for row in table.where('(foo == 234)')][:10000]

However, the iterator must know its own size while the list comprehension exhaustion so there is surely a way to hack this together. I just could not find a suitable way doing that.

Btw. I also tried using zip and range to force a StopIteration:

[row['bar'] for for _, row in zip(range(10000), table.where('(foo == 234)'))]

But this gave me repeated numbers of the same row.

Upvotes: 2

Views: 234

Answers (1)

Ry-
Ry-

Reputation: 225074

Since it’s an iterable and appears to produce rows on demand, you should be able to speed it up with itertools.islice.

rows = list(itertools.islice(table.where('(foo == 234)'), 10000))

Upvotes: 1

Related Questions