Reputation: 11525
If you navigate to the following url and select Search By
Country
.
Then insert AE
for Holder Country
.
As the following:
After you press search
. then you will notice an XHR
call to the following API which is a POST
request.
Here's it:
as you can see there's value for qz
which i can't get how it's implemented in order to call the API
and do pagination too.
May someone has a clue on how to call that API and do the pagination ?
The best which i reached is the JS functions location which handle the encoding of parameters here
I've already tried selenium with proxy rotation service but i got detected after retrieving some pages.
Upvotes: 10
Views: 886
Reputation: 20052
You need to generate a wipo-visitor-uunid
as pass it to the POST
request as a cookie along with a bunch of other stuff.
The code that generates the wipo-visitor-uunid
is this:
(function (){
//generate unique visitor id cookie
if (!Math.imul) Math.imul = function(opA, opB) {
opB |= 0;
var result = (opA & 0x003fffff) * opB;
if (opA & 0xffc00000) result += (opA & 0xffc00000) * opB |0;
return result |0;
};
var _cuunid = 'wipo-visitor-uunid=';
uunid(0);
function uunid(force){
if (force || document.cookie.indexOf(_cuunid)===-1){
var value = navigator.userAgent + Date.now() + Math.random().toString().substring(2,11);
var cookie = _cuunid + cyrb53(value) + ';expires=Jan 2 2034 00:00:00; path=/; SameSite=Lax; domain=.wipo.int';
document.cookie = cookie;
}
}
function cyrb53(str, seed) {
seed = seed || 0;
let h1 = 0xdeadbeef ^ seed, h2 = 0x8badf00d ^ seed;
for (let i = 0, ch; i < str.length; i++) {
ch = str.charCodeAt(i);
h1 = Math.imul(h1 ^ ch, 2654435761);
h2 = Math.imul(h2 ^ ch, 1597334677);
}
h1 = Math.imul(h1 ^ h1>>>16, 2246822507) ^ Math.imul(h2 ^ h2>>>13, 3266489909);
h2 = Math.imul(h2 ^ h2>>>16, 2246822507) ^ Math.imul(h1 ^ h1>>>13, 3266489909);
// return 4294967296 * (2097151 & h2) + (h1>>>0);
return (h2>>>0).toString(16)+(h1>>>0).toString(16);
}
}());
The wipo-visitor-uunid
is valid till Jan 2 2034
, so once you have it, you should be fine.
Oh, and that string that you add to the POST
seems to be query region result, but I'm not sure how it's generated. More on that in the other answers to this quesiton.
Here's the code, test it out on your end:
import json
import requests
query_string = "qz=N4IgLgngDgpiBcIBGAnAhgOwCYgDQgBs0EQYM8QBHASxIAYBaGAOSwAUAO" \ "AMzAHY0AYgHcAWtQCuADQD2WNAQBeADyRIhCgIIBFLABlpANQIARAEIBNABIQ" \
"AVlwCi0gKoBZALwVK4mN4QBGfAB9Ej8/Og46EABfIAAA="
with requests.Session() as s:
the_cookies = s.get("https://www3.wipo.int/branddb/en/").cookies.get_dict()
the_cookies["wipo-visitor-uunid"] = "994c22024f522fd"
s.headers["user-agent"] = "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.90 Safari/537.36"
s.headers["X-Requested-With"] = "XMLHttpRequest"
s.headers["Referer"] = "https://www3.wipo.int/branddb/en/"
end_point = f"https://www3.wipo.int/branddb/jsp/select.jsp?{query_string}"
your_precious_data = s.post(end_point, cookies=the_cookies).json()
print(json.dumps(your_precious_data, indent=2))
This should return an output that looks like this:
{
"lastUpdated": 1616081900884,
"sv": "www3.wipo.int",
"response": {
"docs": [
{
"OO": "NZ",
"score": 1,
"STATUS": "PEND",
"MTY": [
"Word"
],
"AD": "2021-03-17T23:59:59Z",
"HOL": [
"PONSONBY DOGS LIMITED"
],
"NC": [
43
],
"SOURCE": "NZTM",
"DOC": "36/03/1173603_20210317.1919.xml.gz",
"ID": "NZTM.1173603",
"BRAND": [
"Good Dog"
],
"HOLC": [
"NZ"
]
},
and much, much more data ...
Upvotes: 7
Reputation:
The qz
value is "encoded" JSON using LZString.compressToBase64
The qi
value seems to be intially taken from qk
in the source HTML with 0-
prepended to it.
var qk = "ooooooooooooooooooo";
// if(!(w == 790 && (h == 600 || h == 590)))
qk = "yj0IAlhpQGl9BLWmmmJ2WMuzofkYFis64bmU5/6mE8w=";
Certain requests require the number to be incremented after you make them.
You also need the cookie given in the other answer.
Upvotes: 16