Reputation: 407
I have millions of rows with log as unique id. To process the data and save to another table I am doing something like this
action = """select ID_log, ID_send_message, ID_send, ID_message, ID_recipient,email, action, action_date from `internal.actions` where ID_log > {} limit 10000""".format(Id_log_number)
actions_full = pandas_gbq.read_gbq(action, project_id='mt-int')
After processing the data I will save it in separate table ID log will be the last id log of the generated table action_log. Now on running the above code multiple time I am getting the same set of row again and again. Is there a way to get the rows at a specific pattern such that I don't get same set of data. i am using time.sleep to avoid pulling data from cache.
Upvotes: 0
Views: 161
Reputation: 314
It's supposed to give you the same results, because the same query and the same conditions, if you want to read data as chunks from google bigquery table so you don't overload the server's memory, try google pagination for bigquery (document)
Upvotes: 1
Reputation: 6041
You are getting the same data back because your WHERE
statements finds the first row to match the criteria and then you are using LIMIT
which limits your results to first 10000 rows, no matter how many more you have. Try to remove LIMIT 10000
and you should get different result every time you update your table.
If you would like to get last n rows from database then you can simply sort your results descending and then apply limit to get only desired amount of results, like:
action = "SELECT ID_log, ID_send_message, ID_send, ID_message, ID_recipient,email, action, action_date FROM `internal.actions` WHERE ID_log > {} ORDER BY ID_log DESC LIMIT 10000".format(Id_log_number)
Upvotes: 1