Reputation: 11
I am using Splunk to analyse SFGov open data (data.sfgov.org), which is a Socrata system.
I am able to download the json data and analyse stuff offline. I am now implementing automated indexing of updates to the datasets (everyday).
I am trying to figure out which Socrata API fields to actually use to get the new records since my last poll.
I know I can use the $where URL option to filter again the :created_at and :updated_at parameters, but is there a rowID or last index or something like that? I will maintain local state on the splunk side on the last fetched row for e.g.
Like if the last row I got last night was 18104, then for tonite's check, I will ask for rows posted > 18104.
Thanks in advance! I am using python for the automation.
------ added 11/02/2016 ---
Currently I am manually testing trying this type of GET (tested it using hurl.it)
https://data.sfgov.org/resource/nwsr-z4mh.json?$where=:created_at between '2016-10-23T18:00:00' and '2016-11-03T00:00:00'&$order=:created_at DESC&$select=:*, *
So if I were to put this into python I need to simply save the previous fetch date-time and do a 'between and and hope to get the latest created records.
I prefer a way to refer to a ROW#, but I don't know how to use the id":"row-8aiu.d5x4~8rdi" parameter yet.
Upvotes: 1
Views: 839
Reputation: 1566
It looks like you're doing the right thing already. You'd just want to save the latest :created_at
or :updated_at
and use that in your $where
for your following query.
You can't do a $where=:updated_at > :row-...
because row IDs are identifiers, not datetimes.
Upvotes: 2