Gustavo Bezerra
Gustavo Bezerra

Reputation: 11034

How to retrive more than 10k lines from InfluxDB using Pandas?

I am trying to use InfluxDB's Python client's to retrieve data stored on InfluxDB, but can't more than 10k lines. The examples I am (unsuccessfully) following are here. In summary:

import influxdb
dfclient = influxdb.DataFrameClient('localhost', 8086, 'root', 'root', 'mydb')
q = "select * from some_measurement"
df = dfclient.query(q, chunked=True)  # Returns only 10k points

The issue seems to be related to InfluxDB's internal limitations documented here (namely, the max-row-limit configuration option). I am going through the sources to try to find out how to get a DataFrame larger than 10k lines, but any help in solving this issue would be highly appreciated.

Upvotes: 9

Views: 10582

Answers (2)

Mohammad Ali
Mohammad Ali

Reputation: 922

have you attempted to set the chunked flag on your query to receive the data back in chunks. This can be done using a query like the following:

influxdb.DataFrameClient(host='localhost', port=8086, username='root', password='root', database=None, ssl=False, verify_ssl=False, timeout=None, use_udp=False, udp_port=4444, proxies=None)

you can read more about it here in section 1.2.3

Upvotes: 3

Gustavo Bezerra
Gustavo Bezerra

Reputation: 11034

The problem is caused by the DataFrameClient's query simply ignoring the chunked argument [code].

The workaround I found out is not use the standard InfluxDBClient instead. The code shown in the question becomes:

import influxdb
client = influxdb.InfluxDBClient('localhost', 8086, 'root', 'root', 'btc')
q = "select * from some_measurement"
df = pd.DataFrame(client.query(q, chunked=True, chunk_size=10000).get_points())  # Returns all points

It is also worth highlighting that from v1.2.2 the max-row-limit setting (i.e. the default value for chunk_size in the above code) has been change from 10k to unlimited.

Upvotes: 10

Related Questions