Hamilton
Hamilton

Reputation: 678

how to correctly fetch dataframe from url in python?

I want to automate data fetching from USDA website, where I am specifically interested in few categories for data selection. To do so, I tried the following:

import io
import requests
import pandas as pd

url = 'https://www.marketnews.usda.gov/mnp/ls-report-retail?&repType=summary&portal=ls&category=Retail&species=BEEF&startIndex=1'

query_list = {"Report Type":"item","species":"BEEF","portal":"ls","category":"Retail", "Regions":"National", "Grades":"ALL", "Cut": "All", "Dates_from":"2019-03-01", "Dates_to":"2021-02-01"}
req = requests.get(url, params=query_list)
df = pd.read_csv(io.StringIO(req.text), sep="\s\s+", engine="python")
df.to_csv("usda_report.csv")

but I couldn't get the expected dataframe that I want, here is the output that after I tried to run above attempt:

ParserError: Expected 1 fields in line 117, saw 2. Error could possibly be due to quotes being ignored when a multi-char delimiter is used.

desired output

I need to pass these queries to do correct data selection: Category = "Retail"; Report Type = "Item"; Species = "Beef"; Region(s) = "National"; Dates_from = "2019-03-01"; Dates_to = "2021-02-15".

ideally, I want to pass those queries and want to get the following dataframe (head of dataframe):

enter image description here

update

in my desired outputs, I need those columns: Date, Region, Grade, Cut, Retail Items, Outlets or number of stores, Weighted Avg

from the above attempt, I couldn't get the output dataframe like this. How should I fetch data correctly? Can anyone suggest possible of doing this right in pandas? any idea?

Upvotes: 0

Views: 326

Answers (1)

Håken Lid
Håken Lid

Reputation: 23064

You must add the query parameter format=text to get the data in csv format from this web site.

url = 'https://www.marketnews.usda.gov/mnp/ls-report-retail'
query_list = {
    "format":"text", 
    "repType":"item",
    "species":"BEEF",
    "portal":"ls",
    "region":"NATIONAL",
    "cut":"0",
    "repDate":"03/01/2019", 
    "endDate":"02/01/2021",
}
req = requests.get(url, params=query_list)
df = pd.read_csv(io.StringIO(req.text), sep="\s\s+", engine="python")

You might have to modify the query parameters further. You can use the web site with your browser and change the filters you want. Then you can convert the current query parameters in the url to json with this command in the javascript terminal.

JSON.stringify(Object.fromEntries(new URLSearchParams(location.search)), null, 2)

Upvotes: 1

Related Questions