Subbu Vincent
Subbu Vincent

Reputation: 11

How do I get the most recent rows in Socrata SODA datasets since the last query?

I am using Splunk to analyse SFGov open data (data.sfgov.org), which is a Socrata system.

I am able to download the json data and analyse stuff offline. I am now implementing automated indexing of updates to the datasets (everyday).

I am trying to figure out which Socrata API fields to actually use to get the new records since my last poll.

I know I can use the $where URL option to filter again the :created_at and :updated_at parameters, but is there a rowID or last index or something like that? I will maintain local state on the splunk side on the last fetched row for e.g.

Like if the last row I got last night was 18104, then for tonite's check, I will ask for rows posted > 18104.

Thanks in advance! I am using python for the automation.

------ added 11/02/2016 ---

Currently I am manually testing trying this type of GET (tested it using hurl.it)

https://data.sfgov.org/resource/nwsr-z4mh.json?$where=:created_at between '2016-10-23T18:00:00' and '2016-11-03T00:00:00'&$order=:created_at DESC&$select=:*, *

So if I were to put this into python I need to simply save the previous fetch date-time and do a 'between and and hope to get the latest created records.

I prefer a way to refer to a ROW#, but I don't know how to use the id":"row-8aiu.d5x4~8rdi" parameter yet.

Upvotes: 1

Views: 839

Answers (1)

chrismetcalf
chrismetcalf

Reputation: 1566

It looks like you're doing the right thing already. You'd just want to save the latest :created_at or :updated_at and use that in your $where for your following query.

You can't do a $where=:updated_at > :row-... because row IDs are identifiers, not datetimes.

Upvotes: 2

Related Questions