JMH
JMH

Reputation: 303

Extracting follower count from Instagram

I am trying to pull the the number of followers from a list of Instagram accounts. I have tried using the "find" method within Requests, however, the string that I am looking for when I inspect the actual Instagram no longer appears when I print "r" from the code below.

Was able to get this code to run successfully find the past, however, will no longer run. Webscraping Instagram follower count BeautifulSoup

import requests

user = "espn"
url = 'https://www.instagram.com/' + user
r = requests.get(url).text

start = '"edge_followed_by":{"count":'
end = '},"followed_by_viewer"'

print(r[r.find(start)+len(start):r.rfind(end)])

I receive a "-1" error, which means the substring from the find method was not found within the variable "r".

Upvotes: 3

Views: 6707

Answers (2)

mwx
mwx

Reputation: 331

I want to suggest an updated solution to this question, as the answer of Derek Eden above from 2019 does not work anymore, as stated in its comments.

The solution was to add the r' before the regular expression in the re.search like so:

follower_count = re.search(r'"edge_followed_by\\":{\\"count\\":([0-9]+)}', response).group(1)

This r'' is really important as without it, Python seems to treat the expression as regular string which leads to the query not giving any results.

Also the instagram page seems to have backslashes in the object we look for at least in my tests, so the code example i use is the following in Python 3.10 and working as of July 2022:

# get follower count of instagram profile
import os.path
import requests
import re
import urllib3
urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)

# get instagram follower count
def get_instagram_follower_count(instagram_username):
    url = "https://www.instagram.com/" + instagram_username
    filename = "instagram.html"

    try:
        if not os.path.isfile(filename):
            r = requests.get(url, verify=False)
            print(r.status_code)
            print(r.text)
            response = r.text

            if not r.status_code == 200:
                raise Exception("Error: " + str(r.status_code))
            
            with open(filename, "w") as f:
                f.write(response)

        else:
            with open(filename, "r") as f:
                response = f.read()
                # print(response)

        follower_count = re.search(r'"edge_followed_by\\":{\\"count\\":([0-9]+)}', response).group(1)
        return follower_count

    except Exception as e:
        print(e)
        return 0


print(get_instagram_follower_count('your.instagram.profile'))

The method returns the follower count as expected. Please note that i added a few lines to not hammer Instagrams webserver and get blocked while testing by just saving the response in a file.

This is a slice of the original html content that contains the part we are looking for:

... mRL&s=1\",\"edge_followed_by\":{\"count\":110070},\"fbid\":\"1784 ...

I debugged the regex in regexr, it seems to work just fine at this point in time.

There are many posts about the regex r prefix like this one

Also the documentation of the re package shows clearly that this is the issue with the code above.

Upvotes: 0

Derek Eden
Derek Eden

Reputation: 4618

I think it's because of the last ' in start and first ' in end...this will work:

import requests
import re

user = "espn"
url = 'https://www.instagram.com/' + user
r = requests.get(url).text
followers = re.search('"edge_followed_by":{"count":([0-9]+)}',r).group(1)

print(followers)

'14061730'

Upvotes: 4

Related Questions