Reputation: 33
i want to extract comments from a website, with this code i success to extract comments.
import requests
from urllib.parse import unquote
url = 'https://apicomment.detik.com/graphql'
payload = {"query":"query search($type: String!, $size: Int!,$anchor: Int!, $sort: String!, $adsLabelKanal: String, $adsEnv: String, $query: [ElasticSearchAggregation]) {\nsearch(type: $type, size: $size,page: $anchor, sort: $sort,adsLabelKanal: $adsLabelKanal, adsEnv: $adsEnv, query: $query){\npaging sorting counter counterparent profile hits {\nposisi hasAds results {\n id author content like prokontra status news create_date pilihanredaksi refer liker { id } reporter { id status_report } child { id child parent author content like prokontra status create_date pilihanredaksi refer liker { id } reporter { id status_report } authorRefer } } } }}","variables":{"type":"comment","sort":"newest","size":10,"anchor":1,"query":[{"name":"news.artikel","terms":5307853},{"name":"news.site","terms":"dtk"}],"adsLabelKanal":"detik_finance","adsEnv":"desktop"}}
while True:
r = requests.post(url,json=payload)
container = r.json()['data']['search']['hits']['results']
if not container:
break
else:
for item in container:
if not len(item['author']):continue
print(item['author']['name'],unquote(item['content']))
payload['variables']['anchor']+=1
but, actually im not really understand about this code especially this line.
url = 'https://apicomment.detik.com/graphql'
payload = {"query":"query search($type: String!, $size: Int!,$anchor: Int!, $sort: String!, $adsLabelKanal: String, $adsEnv: String, $query: [ElasticSearchAggregation]) {\nsearch(type: $type, size: $size,page: $anchor, sort: $sort,adsLabelKanal: $adsLabelKanal, adsEnv: $adsEnv, query: $query){\npaging sorting counter counterparent profile hits {\nposisi hasAds results {\n id author content like prokontra status news create_date pilihanredaksi refer liker { id } reporter { id status_report } child { id child parent author content like prokontra status create_date pilihanredaksi refer liker { id } reporter { id status_report } authorRefer } } } }}","variables":{"type":"comment","sort":"newest","size":10,"anchor":1,"query":[{"name":"news.artikel","terms":5307853},{"name":"news.site","terms":"dtk"}],"adsLabelKanal":"detik_finance","adsEnv":"desktop"}}
the url is different. but, the output exactly what i really want. can someone explain this to me and give me some reference ?
Upvotes: 0
Views: 44
Reputation: 2439
The URL is different because it is not the website itsself you are extracting the comments from but an comment-api. The api provides a simple method to search for comments without reverse-engineering the website.
The paylod is how you tell the api what you are looking for. There is propably some documentation about how exactly your payload has to be formatted for this exact api.
Upvotes: 1