Reputation: 678
I want to automate data fetching from USDA website, where I am specifically interested in few categories for data selection. To do so, I tried the following:
import io
import requests
import pandas as pd
url = 'https://www.marketnews.usda.gov/mnp/ls-report-retail?&repType=summary&portal=ls&category=Retail&species=BEEF&startIndex=1'
query_list = {"Report Type":"item","species":"BEEF","portal":"ls","category":"Retail", "Regions":"National", "Grades":"ALL", "Cut": "All", "Dates_from":"2019-03-01", "Dates_to":"2021-02-01"}
req = requests.get(url, params=query_list)
df = pd.read_csv(io.StringIO(req.text), sep="\s\s+", engine="python")
df.to_csv("usda_report.csv")
but I couldn't get the expected dataframe that I want, here is the output that after I tried to run above attempt:
ParserError: Expected 1 fields in line 117, saw 2. Error could possibly be due to quotes being ignored when a multi-char delimiter is used.
desired output
I need to pass these queries to do correct data selection: Category = "Retail"; Report Type = "Item"; Species = "Beef"; Region(s) = "National"; Dates_from = "2019-03-01"; Dates_to = "2021-02-15"
.
ideally, I want to pass those queries and want to get the following dataframe (head of dataframe):
update
in my desired outputs, I need those columns: Date
, Region
, Grade
, Cut
, Retail Items
, Outlets
or number of stores
, Weighted Avg
from the above attempt, I couldn't get the output dataframe like this. How should I fetch data correctly? Can anyone suggest possible of doing this right in pandas? any idea?
Upvotes: 0
Views: 326
Reputation: 23064
You must add the query parameter format=text
to get the data in csv format from this web site.
url = 'https://www.marketnews.usda.gov/mnp/ls-report-retail'
query_list = {
"format":"text",
"repType":"item",
"species":"BEEF",
"portal":"ls",
"region":"NATIONAL",
"cut":"0",
"repDate":"03/01/2019",
"endDate":"02/01/2021",
}
req = requests.get(url, params=query_list)
df = pd.read_csv(io.StringIO(req.text), sep="\s\s+", engine="python")
You might have to modify the query parameters further. You can use the web site with your browser and change the filters you want. Then you can convert the current query parameters in the url to json with this command in the javascript terminal.
JSON.stringify(Object.fromEntries(new URLSearchParams(location.search)), null, 2)
Upvotes: 1