Reputation: 197
I am trying to scrape a html table from the below URL.
Through the Chrome developer tools, I found that the actual data is from a redirected url, and I have made the code as below :
import requests
headers={'User=Agent':'Mozilla/5.0 (Macintosh; Intel Mac OS X 11_2_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.182 Safari/537.36'}
data = {'CW_SEARCHID': 'JCCHT05S',
'CW_JAPANKBN': '2',
'CW_IMPKBN': '2',
'CW_YMKBN': '1',
'CW_SYY': '2020',
'CW_SMM': '12',
'CW_HSKBN': '2',
'CW_HSCODE': '230660',
'CW_KUNIKBN': '1',
'CW_ZMKBN': '1',
'CW_MEISAICNT': '200'}
newurl = "https://www.customs.go.jp/JCWSV02/servlet/JCWSV02"
r2 = requests.post(newurl,headers=headers,data=data)
print (r2.text)
However the above code does not get the table results. I do not know the reason.
The attempts that I have tried :
I have tried to add the cookies as below, but the results were the same.
cookies = {'visid_incap_763612':'dXFhIavZRrW8jvit8CkY9zirL2AAAAAAQUIPAAAAAACs+oxBQjxSp9TdZl25YI/Y','incap_ses_948_763612':'lRxvM4bArjwmSaqzRPknDYHIL2AAAAAAFXLtsiRyEhyFOCzgsz8MXA=='}
data = "CW_SEARCHID=JCCHT05S&CW_JAPANKBN=2&CW_IMPKBN=2&CW_CARGOKBN=&CW_SUMKBN=&CW_SPCODE=&CW_SPNAME=&CW_YMSORTKBN=&CW_SISUKBN=&CW_SENKIKBN=&CW_HKKBN=&CW_YMKBN=1&CW_KI=&CW_SYY=2020&CW_EYY=&CW_SMM=12&CW_EMM=&CW_HSKBN=2&CW_HSCODE=230660&CW_HSNAME=&CW_HSCODE=&CW_HSNAME=&CW_HSCODE=&CW_HSNAME=&CW_HSCODE=&CW_HSNAME=&CW_HSCODE=&CW_HSNAME=&CW_HSCODE=&CW_HSNAME=&CW_HSCODE=&CW_HSNAME=&CW_HSCODE=&CW_HSNAME=&CW_HSCODE=&CW_HSNAME=&CW_HSCODE=&CW_HSNAME=&CW_KUNIKBN=1&CW_KUNICODE=&CW_KUNINAME=&CW_KUNICODE=&CW_KUNINAME=&CW_KUNICODE=&CW_KUNINAME=&CW_KUNICODE=&CW_KUNINAME=&CW_KUNICODE=&CW_KUNINAME=&CW_KUNICODE=&CW_KUNINAME=&CW_KUNICODE=&CW_KUNINAME=&CW_KUNICODE=&CW_KUNINAME=&CW_KUNICODE=&CW_KUNINAME=&CW_KUNICODE=&CW_KUNINAME=&CW_ZMKBN=1&CW_ZMCODE=&CW_ZMNAME=&CW_ZMCODE=&CW_ZMNAME=&CW_ZMCODE=&CW_ZMNAME=&CW_ZMCODE=&CW_ZMNAME=&CW_ZMCODE=&CW_ZMNAME=&CW_ZMCODE=&CW_ZMNAME=&CW_ZMCODE=&CW_ZMNAME=&CW_ZMCODE=&CW_ZMNAME=&CW_ZMCODE=&CW_ZMNAME=&CW_ZMCODE=&CW_ZMNAME=&CW_MEISAICNT=200"
r2 = requests.post(newurl,headers=headers,data=data,cookies=cookies,verify=True)
Can anyone give me some advice?
Upvotes: 1
Views: 257
Reputation: 22440
Try the following to get the required response. Turn out that you need to add all the keys and values within data which you didn't do. As a quick test I did the following and got expected results:
import requests
headers = {
'User=Agent':'Mozilla/5.0 (Macintosh; Intel Mac OS X 11_2_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.182 Safari/537.36',
'referer':'https://www.customs.go.jp/toukei/srch/jccht00p.htm',
'content-type': 'application/x-www-form-urlencoded'
}
data = "CW_SEARCHID=JCCHT05S&CW_JAPANKBN=2&CW_IMPKBN=2&CW_CARGOKBN=&CW_SUMKBN=&CW_SPCODE=&CW_SPNAME=&CW_YMSORTKBN=&CW_SISUKBN=&CW_SENKIKBN=&CW_HKKBN=&CW_YMKBN=1&CW_KI=&CW_SYY=2020&CW_EYY=&CW_SMM=12&CW_EMM=&CW_HSKBN=2&CW_HSCODE=230660&CW_HSNAME=&CW_HSCODE=&CW_HSNAME=&CW_HSCODE=&CW_HSNAME=&CW_HSCODE=&CW_HSNAME=&CW_HSCODE=&CW_HSNAME=&CW_HSCODE=&CW_HSNAME=&CW_HSCODE=&CW_HSNAME=&CW_HSCODE=&CW_HSNAME=&CW_HSCODE=&CW_HSNAME=&CW_HSCODE=&CW_HSNAME=&CW_KUNIKBN=1&CW_KUNICODE=&CW_KUNINAME=&CW_KUNICODE=&CW_KUNINAME=&CW_KUNICODE=&CW_KUNINAME=&CW_KUNICODE=&CW_KUNINAME=&CW_KUNICODE=&CW_KUNINAME=&CW_KUNICODE=&CW_KUNINAME=&CW_KUNICODE=&CW_KUNINAME=&CW_KUNICODE=&CW_KUNINAME=&CW_KUNICODE=&CW_KUNINAME=&CW_KUNICODE=&CW_KUNINAME=&CW_ZMKBN=1&CW_ZMCODE=&CW_ZMNAME=&CW_ZMCODE=&CW_ZMNAME=&CW_ZMCODE=&CW_ZMNAME=&CW_ZMCODE=&CW_ZMNAME=&CW_ZMCODE=&CW_ZMNAME=&CW_ZMCODE=&CW_ZMNAME=&CW_ZMCODE=&CW_ZMNAME=&CW_ZMCODE=&CW_ZMNAME=&CW_ZMCODE=&CW_ZMNAME=&CW_ZMCODE=&CW_ZMNAME=&CW_MEISAICNT=200"
newurl = "https://www.customs.go.jp/JCWSV02/servlet/JCWSV02"
r2 = requests.post(newurl,headers=headers,data=data)
print (r2.text)
Upvotes: 2