Reputation: 936
I need to scrape all the comments from an online newspaper article. Comments are loaded through an API:
https://api-graphql.lefigaro.fr/graphql?id=widget-comments_prod_commentsQuery2_a54719015f409774c55c77471444274cc461e7ee30bcbe03c8449e39ae15b16c&variables={%22id%22:%22bGVmaWdhcm8uZnJfXzYwMjdkODk4LTJjZWQtMTFlYi1hYmNlLTMyOGIwNDdhZjcwY19fQXJ0aWNsZQ==%22,%22page%22:2}
Therefore I am using requests to get its content:
url = "https://api-graphql.lefigaro.fr/graphql?id=widget-comments_prod_commentsQuery2_a54719015f409774c55c77471444274cc461e7ee30bcbe03c8449e39ae15b16c"
params = "variables={%22id%22:%22bGVmaWdhcm8uZnJfXzYwMjdkODk4LTJjZWQtMTFlYi1hYmNlLTMyOGIwNDdhZjcwY19fQXJ0aWNsZQ==%22,%22page%22:2}"
response = requests.get(url, params).json()
print(json.dumps(response, indent=4))
But what I need is to create a for loop so I can get every comments since only 10 comments are displayed at a time.
I can't find a way. I tried to use .format() with params like that:
params = "variables={%22id%22:%22bGVmaWdhcm8uZnJfXzYwMjdkODk4LTJjZWQtMTFlYi1hYmNlLTMyOGIwNDdhZjcwY19fQXJ0aWNsZQ==%22,%22page%22:{page_numb}}".format("page_numb":2)
But I get a SyntaxError.
Upvotes: 0
Views: 48
Reputation: 10809
str.format
doesn't work with key-value pairs. Try "...{page_num}".format(page_num="2")
.
Also, since the {
and }
characters are part of the query payload, you'll have to escape them. {
becomes {{
and }
becomes }}
. For example, "{hello {foo}}".format("foo": "world")
becomes "{{hello {foo}}}".format(foo="world")
. You'll also have to decode the url-encoded string:
from urllib.parse import unquote
params = unquote("variables={{%22id%22:%22bGVmaWdhcm8uZnJfXzYwMjdkODk4LTJjZWQtMTFlYi1hYmNlLTMyOGIwNDdhZjcwY19fQXJ0aWNsZQ==%22,%22page%22:{page_numb}}}".format(page_numb=2))
Upvotes: 1