Reputation: 301
I am trying to filter through ny gov open database with their SODA API. I am following the docs on how to filter, but it is returning an empty dataframe.
# noinspection PyUnresolvedReferences
import numpy as np
# noinspection PyUnresolvedReferences
import pandas as pd
# noinspection PyUnresolvedReferences
from sodapy import Socrata
clientNYgov = Socrata('data.ny.gov', None)
Here is where I am trying to find only results in NY.
databaseM = clientNYgov.get('yg7h-zjbf.csv?business_city=NEW+YORK')
dfDatabaseM = pd.DataFrame.from_records(databaseM)
dfDatabaseM.to_csv('Manhattan Agents.csv')
print(dfDatabaseM)
But here is the Empty Output:
0 1 ... 9 10
0 business_address_1 business_address_2 ... license_number license_type
[1 rows x 11 columns]
Process finished with exit code 0
Please let me know if there's a problem with how I am filtering, not quite sure what is going wrong here. Thanks so much in advance!
Upvotes: 2
Views: 1051
Reputation: 12396
There are two approaches to do this with filters.
Method 1
This can be done using Socrata()
by passing the filters using SQL to the query
keyword in the get()
method of the instantiated Socrata
client. You will need an application token. If you do not use a token, then your requests will be subjected to throttling. To avoid throttling, sign up for a socrata account and create your app token
query = f"""SELECT * WHERE business_city="NEW YORK" LIMIT 50000"""
client = Socrata("data.ny.gov", <YOUR-APP-TOKEN-HERE>)
results = client.get("yg7h-zjbf", query=query)
df_socrata = pd.DataFrame.from_records(results)
Method 2
Using the JSON endpoint (same as @Joseph Gattuso's answer)
data = requests.get(
"http://data.ny.gov/resource/yg7h-zjbf.json?"
"$limit=50000&"
"business_city=NEW YORK"
).json()
df = pd.DataFrame.from_records(data)
Comparison of output - Verify that the two methods return the same result
assert df_socrata.equals(df)
Upvotes: 1
Reputation: 11
Socrata uses a json endpoint to export the files via the API. This is found in the top right hand corner of the dataset when selecting API. For this solution I am using just requests to retrieve the data. The Soda module is nice to use, but works the same as a request.
import pandas as pd
import requests
data=requests.get('http://data.ny.gov/resource/yg7h-zjbf.json?$limit=50000&business_city=NEW YORK').json()
df=pd.DataFrame.from_records(data)
df
Upvotes: 1