Reputation: 81
I'm trying to export all users who register in my website with Janrain in Python. From the Janrain document, it looks like entity.find is the best call to get the data. So I type in the following code:
get_user = api.call
(
"entity.find",
type_name = "user",
)
However, the code only returns 100 rows of data. I know there's another field called max_results, but it can take 10000 records max.
So, how do I use the api to export all my user data without the row limitation?
Thank you!
Upvotes: 1
Views: 570
Reputation: 374
You will have to export the data in batches.
As you noted, the entity.find call takes a max_results parameter. You can attempt to set that to a high value but in most cases the number of records will exceed the payload limits and/or API timeout restrictions and the API Call will fail.
Janrain recommends to step through sets of values with the first_results and max_results parameters, for example, in batches of 1000:
first_result=0&max_results=1000
first_result=1000&max_results=1000
first_result=2000&max_results=1000
Retrieve a Large Number of Entities Efficiently
If you are retrieving groups of records, it is possible that someone else will delete a record in one of the groups that you have already retrieved. Because entity.find counts each group of records from the beginning of the list, your next group may miss a record that filled in the space of the deleted record. To avoid such issues, follow these best practices:
When gathering large groups of records: Given n, the maximum number of results to be returned (1000 is a good place to start; 10000 is the maximum), and f, the record query filter, use the parameters:
- sort_on=["id"]
- max_results=n
Then:
1. Call entity.find with filter=f
2. Let x be the id of the last record in the result set
3. Call entity.find with filter=f and id > x
4. If the result set is not empty, go to step 2
This results in a fast search, with no chance of missing any records. If you do not wish to use a query filter, omit the filter parameter from the call in step 1, and use filter=id > x in step 3.
Note: Janrain also recommends avoiding the use of the “show_total_count” parameter in most use cases as it comes with a significant performance penalty. Additionally, if the system is a live production system with a relatively large number of active registrations, the total number of records may be different at the end of the export process when compared to the starting total.
Upvotes: 2