Dhruv Chaudhary
Dhruv Chaudhary

Reputation: 1

Import all rows from dataset using SODA API Python

I'm trying to import the following dataset and store it in a pandas dataframe: https://data.nasa.gov/Space-Science/Meteorite-Landings/gh4g-9sfh/data

I use the following code:

 r = requests.get('https://data.nasa.gov/resource/gh4g-9sfh.json')
 meteor_data = r.json()
 df = pd.DataFrame(meteor_data)
 print(df.shape)

The resulting dataframe only has 1000 rows. I need it to have all 45,716 rows. How do I do this?

Upvotes: 0

Views: 1778

Answers (3)

Romasa Qasim
Romasa Qasim

Reputation: 48

Well I know now this question was asked a long time ago but I am replying here so that some other user may get benefitted. I am modifying the code of @Rachit Kumar a little by setting the limit as a very large absurd number to download all the data.

import pandas as pd
from sodapy import Socrata

# Unauthenticated client only works with public data sets. Note 'None'
# in place of application token, and no username or password:
client = Socrata("data.nasa.gov", None)

# Example authenticated client (needed for non-public datasets):
# client = Socrata(data.nasa.gov,
#                  MyAppToken,
#                  userame="[email protected]",
#                  password="AFakePassword")

# First 2000 results, returned as JSON from API / converted to Python list of
# dictionaries by sodapy.
results = client.get("gh4g-9sfh", limit=999999)

# Convert to pandas DataFrame
results_df = pd.DataFrame.from_records(results)
results_df

Please note that the limit is set to any large number

Upvotes: 0

Rachit Kumar
Rachit Kumar

Reputation: 36

DO LIKE This ans set limit

import pandas as pd
from sodapy import Socrata

# Unauthenticated client only works with public data sets. Note 'None'
# in place of application token, and no username or password:
client = Socrata("data.nasa.gov", None)

# Example authenticated client (needed for non-public datasets):
# client = Socrata(data.nasa.gov,
#                  MyAppToken,
#                  userame="[email protected]",
#                  password="AFakePassword")

# First 2000 results, returned as JSON from API / converted to Python list of
# dictionaries by sodapy.
results = client.get("gh4g-9sfh", limit=2000)

# Convert to pandas DataFrame
results_df = pd.DataFrame.from_records(results)

Upvotes: 0

WaLinke
WaLinke

Reputation: 778

Check out the docs on the $limit parameter

The $limit parameter controls the total number of rows returned, and it defaults to 1,000 records per request.

Note: The maximum value for $limit is 50,000 records, and if you exceed that limit you'll get a 400 Bad Request response.

So you're just getting the default number of records back.

You will not be able to get more than 50,000 records in a single API call - this will take multiple calls using $limit together with $offset

Try:

https://data.nasa.gov/resource/gh4g-9sfh.json$limit=50000

See Why am I limited to 1,000 rows on SODA API when I have an App Key

Upvotes: 0

Related Questions