user53558
user53558

Reputation: 367

How to scrape instagram account info in python

I am trying to do something extremely simple in python yet somehow it's very difficult. All I want to do is write a python script that records the number of people a Instagram user is following, and the number of it's followers. That's it.

Can anyone point me to a good package to do this? preferably not beautiful soup as that is overly complicated for what I want to do. I just want something like

[user: example_user, followers:9019, following:217] 

Is there an Instagram specific python library?

The account I want to scrape is public. This is very simple to do for twitter.

Any help is appreciated.

Upvotes: 5

Views: 16981

Answers (6)

Mohammad Turk
Mohammad Turk

Reputation: 87

import requests

username = "cristiano"
headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36'}

user_info = requests.get('https://instagram.com/%s/?__a=1'%username, headers = headers)

print (user_info.json())

Upvotes: 1

Chris Greening
Chris Greening

Reputation: 550

You could use instascrape to do this with just a few lines of code (DISCLAIMER: I am the author of this package)

pip install it using pip install insta-scrape and then to get a user's account information, try

from instascrape import Profile 
google = Profile("google")
google.scrape()

This loads a couple dozen data points from the account that you can access with dot notation as so google.followers, google.following, google.is_verified, etc. or you can get all the data as a dict with google.to_dict()

{'csrf_token': '19DnM5UYbxusoSnbfUNGGiOr5hU91khz',
 'viewer': None,
 'viewer_id': None,
 'country_code': 'US',
 'language_code': 'en',
 'locale': 'en_US',
 'device_id': 'A0CFC9ED-5769-4951-94B3-F26D5724FDBD',
 'browser_push_pub_key': 'BIBn3E_rWTci8Xn6P9Xj3btShT85Wdtne0LtwNUyRQ5XjFNkuTq9j4MPAVLvAFhXrUU1A9UxyxBA7YIOjqDIDHI',
 'key_id': '132',
 'public_key': 'a185b716b7bab1acb25e88034374819c0482257a4e240736215af2253f255d61',
 'version': '10',
 'is_dev': False,
 'rollout_hash': '7b740aa85a82',
 'bundle_variant': 'metro',
 'frontend_dev': 'prod',
 'logging_page_id': 'profilePage_1067259270',
 'show_suggested_profiles': False,
 'show_follow_dialog': False,
 'biography': 'Google unfiltered—sometimes with filters.',
 'blocked_by_viewer': False,
 'business_email': '',
 'restricted_by_viewer': None,
 'country_block': False,
 'external_url': 'https://linkin.bio/google',
 'external_url_linkshimmed': 'https://l.instagram.com/?u=https%3A%2F%2Flinkin.bio%2Fgoogle&e=ATOMCBcW4YjsNBxlDyAETnOiWt8zHrGTW0VJIufW-ROhSYM5lm2p-JNT060OLDBmMFuoszepQpW0cfEf&s=1',
 'followers': 12262801,
 'followed_by_viewer': False,
 'following': 30,
 'follows_viewer': False,
 'full_name': 'Google',
 'has_ar_effects': False,
 'has_clips': True,
 'has_guides': False,
 'has_channel': False,
 'has_blocked_viewer': False,
 'highlight_reel_count': 6,
 'has_requested_viewer': False,
 'id': '1067259270',
 'is_business_account': True,
 'is_joined_recently': False,
 'business_category_name': 'Business & Utility Services',
 'overall_category_name': None,
 'category_enum': 'INTERNET_COMPANY',
 'is_private': False,
 'is_verified': True,
 'mutual_followers': 0,
 'profile_pic_url': 'https://scontent-lga3-1.cdninstagram.com/v/t51.2885-19/s150x150/119515245_239175997499686_2853342285794408974_n.jpg?_nc_ht=scontent-lga3-1.cdninstagram.com&_nc_ohc=_vp0OGMhUrEAX9mEskb&oh=242d04421b13f2545952203069b164b6&oe=5FC05FDB',
 'profile_pic_url_hd': 'https://scontent-lga3-1.cdninstagram.com/v/t51.2885-19/s320x320/119515245_239175997499686_2853342285794408974_n.jpg?_nc_ht=scontent-lga3-1.cdninstagram.com&_nc_ohc=_vp0OGMhUrEAX9mEskb&oh=604348546412db230d638292b39f7abe&oe=5FC2E223',
 'requested_by_viewer': False,
 'username': 'google',
 'connected_fb_page': None,
 'posts': 1416}

If you really do just need a few data points, you can pass their names as keys explicitly to Post.load

from instascrape import Profile 
google = Profile("google")
google.scrape(keys=['followers', 'following'])

and google.to_dict() will thus gives us

{'followers': 12262807, 'following': 30}

Upvotes: 0

YOGESHWARAN R
YOGESHWARAN R

Reputation: 1

There is a package instagramy

pip install instagramy

from instagramy import InstagramUser
user = InstagramUser("github")
profile_pic = user.profile_pic_url
print(user.is_verified)
print(user.number_of_followers)
print(user.number_of_posts)

GitHub repository of the package

Upvotes: 0

SIM
SIM

Reputation: 22440

As the content you look for are available in page source, you can fetch them using requests in combination with BeautifulSoup.

Give it a try:

import requests
from bs4 import BeautifulSoup

html = requests.get('https://www.instagram.com/michaeljackson/')
soup = BeautifulSoup(html.text, 'lxml')
item = soup.select_one("meta[property='og:description']")
name = item.find_previous_sibling().get("content").split("•")[0]
followers = item.get("content").split(",")[0]
following = item.get("content").split(",")[1].strip()
print(f'{name}\n{followers}\n{following}')

Results:

Name :Michael Jackson
Followers :1.6m
Following :4

Upvotes: 5

adder
adder

Reputation: 3698

I don't know why you would like to avoid using BeautifulSoup, since it is actually quite convinient for tasks like this. So, something along the following lines should do the job:

import requests
from bs4 import BeautifulSoup

html = requests.get('https://www.instagram.com/cristiano/') # input URL here
soup = BeautifulSoup(html.text, 'lxml')

data = soup.find_all('meta', attrs={'property':'og:description'})
text = data[0].get('content').split()

user = '%s %s %s' % (text[-3], text[-2], text[-1])
followers = text[0]
following = text[2]

print('User:', user)
print('Followers:', followers)
print('Following:', following)

...output:

User: Cristiano Ronaldo (@cristiano)

Followers: 111.5m

Following: 387

Of course, you would need to do some calculations to get an actual (yet truncated) number in cases where the user has more than 1m followers (or is following more than 1m users), which should not be too difficult.

Upvotes: 4

JustOneQuestion
JustOneQuestion

Reputation: 332

otherwise you can access the information in that way (yes, I used beautifulsoup)

from bs4 import BeautifulSoup
import urllib

external_sites_html = 
urllib.urlopen('https://www.instagram.com/<instagramname>/?hl=en')
soup = BeautifulSoup(external_sites_html, 'lxml')

name = soup.find('meta', attrs={'property':'og:title'})
description = soup.find('meta', attrs={'property':'og:description'})

# name of user
nameContent = name.get('content')
# information about followers and following users
descrContent = description.get('content')

from that variables you can extract the informations you need. but information about followers will be inaccurate , if they have more than 1 million numbers. if you need the exact number, you may have to use their api.

Upvotes: 1

Related Questions