how can I get non repeated rows on fetching data using keyword limit in big query and python

I have millions of rows with log as unique id. To process the data and save to another table I am doing something like this

action = """select ID_log, ID_send_message, ID_send, ID_message, ID_recipient,email, action, action_date from `internal.actions` where ID_log > {} limit 10000""".format(Id_log_number)
actions_full = pandas_gbq.read_gbq(action, project_id='mt-int')

After processing the data I will save it in separate table ID log will be the last id log of the generated table action_log. Now on running the above code multiple time I am getting the same set of row again and again. Is there a way to get the rows at a specific pattern such that I don't get same set of data. i am using time.sleep to avoid pulling data from cache.

Upvotes: 0

Answers (2)

Joe

Reputation: 314

It's supposed to give you the same results, because the same query and the same conditions, if you want to read data as chunks from google bigquery table so you don't overload the server's memory, try google pagination for bigquery (document)

Upvotes: 1

errata

Reputation: 6041

You are getting the same data back because your WHERE statements finds the first row to match the criteria and then you are using LIMIT which limits your results to first 10000 rows, no matter how many more you have. Try to remove LIMIT 10000 and you should get different result every time you update your table.

If you would like to get last n rows from database then you can simply sort your results descending and then apply limit to get only desired amount of results, like:

action = "SELECT ID_log, ID_send_message, ID_send, ID_message, ID_recipient,email, action, action_date FROM `internal.actions` WHERE ID_log > {} ORDER BY ID_log DESC LIMIT 10000".format(Id_log_number)

Upvotes: 1

how can I get non repeated rows on fetching data using keyword limit in big query and python

Answers (2)

Related Questions