GSatterwhite
GSatterwhite

Reputation: 301

SODA API Filtering

I am trying to filter through ny gov open database with their SODA API. I am following the docs on how to filter, but it is returning an empty dataframe.

# noinspection PyUnresolvedReferences
import numpy as np
# noinspection PyUnresolvedReferences
import pandas as pd
# noinspection PyUnresolvedReferences
from sodapy import Socrata


clientNYgov = Socrata('data.ny.gov', None)

Here is where I am trying to find only results in NY.

databaseM = clientNYgov.get('yg7h-zjbf.csv?business_city=NEW+YORK')

dfDatabaseM = pd.DataFrame.from_records(databaseM)

dfDatabaseM.to_csv('Manhattan Agents.csv')
print(dfDatabaseM)

But here is the Empty Output:

0                   1   ...              9             10
0  business_address_1  business_address_2  ...  license_number  license_type

[1 rows x 11 columns]

Process finished with exit code 0

Please let me know if there's a problem with how I am filtering, not quite sure what is going wrong here. Thanks so much in advance!

Upvotes: 2

Views: 1051

Answers (2)

edesz
edesz

Reputation: 12396

There are two approaches to do this with filters.

Method 1

This can be done using Socrata() by passing the filters using SQL to the query keyword in the get() method of the instantiated Socrata client. You will need an application token. If you do not use a token, then your requests will be subjected to throttling. To avoid throttling, sign up for a socrata account and create your app token

query = f"""SELECT * WHERE business_city="NEW YORK" LIMIT 50000"""
client = Socrata("data.ny.gov", <YOUR-APP-TOKEN-HERE>)
results = client.get("yg7h-zjbf", query=query)
df_socrata = pd.DataFrame.from_records(results)

Method 2

Using the JSON endpoint (same as @Joseph Gattuso's answer)

data = requests.get(
    "http://data.ny.gov/resource/yg7h-zjbf.json?"
    "$limit=50000&"
    "business_city=NEW YORK"
).json()
df = pd.DataFrame.from_records(data)

Comparison of output - Verify that the two methods return the same result

assert df_socrata.equals(df)

Upvotes: 1

Joseph Gattuso
Joseph Gattuso

Reputation: 11

Socrata uses a json endpoint to export the files via the API. This is found in the top right hand corner of the dataset when selecting API. For this solution I am using just requests to retrieve the data. The Soda module is nice to use, but works the same as a request.

import pandas as pd
import requests

data=requests.get('http://data.ny.gov/resource/yg7h-zjbf.json?$limit=50000&business_city=NEW YORK').json()
df=pd.DataFrame.from_records(data)
df

Upvotes: 1

Related Questions