Tyler
Tyler

Reputation: 67

How to continuously parse instagram following with GET query_hash with python3?

I am working on a simple project to get better at Python. I am using the requests library to get "https://www.instagram.com/graphql/query/?query_hash=58712303d941c6855d4e888c5f0cd22f&variables=%7B%22id%22%3A%2225025320%22%2C%22first%22%3A24%7D", which is the first x amount of following that is loaded when clicking the following on Instagram's following(https://www.instagram.com/instagram/following/). My question is, how can I parse all of the following? I tried searching online and could not find any results that demonstrate how to continuously get the next query_hash url. Here is my current code:

# Library imports
import requests
import json
import time

# Variables
LOGIN_URL = 'https://www.instagram.com/accounts/login/ajax/'
REFERER_URL = 'https://www.instagram.com/accounts/login/'
USER_AGENT = 'Mozilla/5.0 (iPhone; CPU iPhone OS 9_1 like Mac OS X) AppleWebKit/601.1.46 (KHTML, like Gecko) Version/9.0 Mobile/13B143 Safari/601.1'
USERNAME = 'username'
PASSWD = 'password'
IGQ = r"https://www.instagram.com/graphql/query/?query_hash=58712303d941c6855d4e888c5f0cd22f&variables=%7B%22id%22%3A%2225025320%22%2C%22first%22%3A24%7D"

# Session variables
session = requests.Session()
req = session.get(LOGIN_URL)
session.headers = {'user-agent': USER_AGENT}
session.headers.update({'Referer': REFERER_URL})
session.headers = {'user-agent': USER_AGENT}
session.headers.update({'x-csrftoken': req.cookies['csrftoken']})
login_data = {'username': USERNAME, 'password': PASSWD}
login = session.post(LOGIN_URL, data=login_data, allow_redirects=True)
session.headers.update({'x-csrftoken': login.cookies['csrftoken']})

# Parse followings
def parse():
    try:
        following = session.get(IGQ)
        test_text = json.loads(following.text)
        usernames = []

        j = test_text['data']['user']['edge_follow']
        for each in j['edges']:
            usernames.append(each['node']['username'])
        print(usernames)

    except:
        print("Couldn't login.")
parse()

I currently can parse the first x amount of followings succesfully, but I'm not sure on how to go about parsing the rest. On chrome dev tools, the next request when scrolling is: https://www.instagram.com/graphql/query/?query_hash=58712303d941c6855d4e888c5f0cd22f&variables=%7B%22id%22%3A%2225025320%22%2C%22first%22%3A12%2C%22after%22%3A%22AQB-48qzOZue7n4BHPi7FETk2TQnrPl5LiWJKl2nsPCUkLcralRpeTo6F3zQze71zjKh7iDypwv4yxR6OOyHwYj-r1hU5S-P1QaMlRn59i3emA%22%7D

Here is the json I am working with, the response of the first url:

    {data: {user: {edge_follow: {count: 193, page_info: {has_next_page: true,…},…}}}, status: "ok"}
data
:
{user: {edge_follow: {count: 193, page_info: {has_next_page: true,…},…}}}
user
:
{edge_follow: {count: 193, page_info: {has_next_page: true,…},…}}
edge_follow
:
{count: 193, page_info: {has_next_page: true,…},…}
count
:
193
edges
:
[{node: {id: "1298763699", username: "mrbentley_thedog", full_name: "Mister Bentley",…}},…]
0
:
{node: {id: "1298763699", username: "mrbentley_thedog", full_name: "Mister Bentley",…}}
1
:
{node: {id: "28892894", username: "guskenworthy", full_name: "gus kenworthy",…}}
2
:
{node: {id: "26633036", username: "anitta", full_name: "anitta 🎤",…}}
3
:
{node: {id: "433479649", username: "puffytails", full_name: "Puffytails Trio",…}}
4
:
{node: {id: "6106847", username: "ttlyteala", full_name: "Teala Dunn",…}}
5
:
{node: {id: "10766410", username: "wrenees", full_name: "Renee Lusano",…}}
6
:
{node: {id: "18428658", username: "kimkardashian", full_name: "Kim Kardashian West",…}}
7
:
{node: {id: "320996985", username: "jugglinjosh", full_name: "Josh Horton",…}}
8
:
{node: {id: "177402262", username: "lelepons", full_name: "Lele Pons",…}}
9
:
{node: {id: "1390031219", username: "brycexavier", full_name: "bryce xavier",…}}
10
:
{node: {id: "1081938380", username: "katieaustin", full_name: "Katie Austin",…}}
11
:
{node: {id: "284216174", username: "susiemeoww", full_name: "Susie Shu 🍒",…}}
12
:
{node: {id: "2786948", username: "theshoesurgeon", full_name: "Dominic Chambrone",…}}
13
:
{node: {id: "182973434", username: "laurengodwin", full_name: "lauren godwin🧚🏼‍♀️✨",…}}
14
:
{node: {id: "16911665", username: "laurdiy", full_name: "Lauren Riihimaki",…}}
15
:
{node: {id: "2077685663", username: "ninja", full_name: "Tyler Blevins",…}}
16
:
{node: {id: "1194735637", username: "mannymua733", full_name: "🌙Manny Gutierrez",…}}
17
:
{node: {id: "5603022012", username: "wildspotted", full_name: "Ida",…}}
18
:
{node: {id: "32085887", username: "lisafreestyle", full_name: "Lisa Zimouche",…}}
19
:
{node: {id: "241302041", username: "kiliiiyuyan", full_name: "Kiliii Yuyan",…}}
20
:
{node: {id: "496200129", username: "ts_abe", full_name: "T.S ABE",…}}
21
:
{node: {id: "424605784", username: "laetitiaky", full_name: "KY",…}}
22
:
{node: {id: "2516489", username: "michaelwalchalk", full_name: "Michael Walchalk",…}}
23
:
{node: {id: "3833293301", username: "afrosinsanjuan", full_name: "Photographing Afro Caribbeans",…}}
page_info
:
{has_next_page: true,…}
end_cursor
:
"AQB-48qzOZue7n4BHPi7FETk2TQnrPl5LiWJKl2nsPCUkLcralRpeTo6F3zQze71zjKh7iDypwv4yxR6OOyHwYj-r1hU5S-P1QaMlRn59i3emA"
has_next_page
:
true
status
:
"ok"

So essentially what I'm trying to do is to get all of the json responses for all of the followings in following. I'm not sure on how to go about this, and any help is greatly appreciated.

Upvotes: 1

Views: 4592

Answers (1)

finicky
finicky

Reputation: 36

In the response there's a key called "end_cursor". Use end_cursor to paginate.

Replace end_cursor with the appropriate key. You can leave end_cursor blank on the first request.

has_next_page = True
end_cursor = None
while has_next_page == True:
    IGQ = "https://www.instagram.com/graphql/query/"
    payload = {"query_hash":"58712303d941c6855d4e888c5f0cd22f", "id":"25025320","first":24, "after": end_cursor}

    following = session.get(IGQ, params=payload).json()

    has_next_page = following['data']['user']['edge_follow']['page_info']['has_next_page']
    if has_next_page == True:
        end_cursor = following['data']['user']['edge_follow']['page_info']['end_cursor']

Upvotes: 2

Related Questions