Reputation: 1
I'm trying to import the following dataset and store it in a pandas dataframe: https://data.nasa.gov/Space-Science/Meteorite-Landings/gh4g-9sfh/data
I use the following code:
r = requests.get('https://data.nasa.gov/resource/gh4g-9sfh.json')
meteor_data = r.json()
df = pd.DataFrame(meteor_data)
print(df.shape)
The resulting dataframe only has 1000 rows. I need it to have all 45,716 rows. How do I do this?
Upvotes: 0
Views: 1778
Reputation: 48
Well I know now this question was asked a long time ago but I am replying here so that some other user may get benefitted. I am modifying the code of @Rachit Kumar a little by setting the limit as a very large absurd number to download all the data.
import pandas as pd
from sodapy import Socrata
# Unauthenticated client only works with public data sets. Note 'None'
# in place of application token, and no username or password:
client = Socrata("data.nasa.gov", None)
# Example authenticated client (needed for non-public datasets):
# client = Socrata(data.nasa.gov,
# MyAppToken,
# userame="[email protected]",
# password="AFakePassword")
# First 2000 results, returned as JSON from API / converted to Python list of
# dictionaries by sodapy.
results = client.get("gh4g-9sfh", limit=999999)
# Convert to pandas DataFrame
results_df = pd.DataFrame.from_records(results)
results_df
Please note that the limit is set to any large number
Upvotes: 0
Reputation: 36
DO LIKE This ans set limit
import pandas as pd
from sodapy import Socrata
# Unauthenticated client only works with public data sets. Note 'None'
# in place of application token, and no username or password:
client = Socrata("data.nasa.gov", None)
# Example authenticated client (needed for non-public datasets):
# client = Socrata(data.nasa.gov,
# MyAppToken,
# userame="[email protected]",
# password="AFakePassword")
# First 2000 results, returned as JSON from API / converted to Python list of
# dictionaries by sodapy.
results = client.get("gh4g-9sfh", limit=2000)
# Convert to pandas DataFrame
results_df = pd.DataFrame.from_records(results)
Upvotes: 0
Reputation: 778
Check out the docs on the $limit parameter
The $limit parameter controls the total number of rows returned, and it defaults to 1,000 records per request.
Note: The maximum value for $limit is 50,000 records, and if you exceed that limit you'll get a 400 Bad Request response.
So you're just getting the default number of records back.
You will not be able to get more than 50,000 records in a single API call - this will take multiple calls using $limit together with $offset
Try:
https://data.nasa.gov/resource/gh4g-9sfh.json$limit=50000
See Why am I limited to 1,000 rows on SODA API when I have an App Key
Upvotes: 0