Awais Tariq
Awais Tariq

Reputation: 23

Can't get data from inside of span-tag with beautifulsoup

I am trying to scrape Instagram page, and want to get/access div-tags present inside of span-tag. but I can't! the HTML of the Instagram page looks like as

 <head>--</head>
    <body>
       <span id="react-root" aria-hidden="false">
       <form enctype="multipart/form-data" method="POST" role="presentation">…</form>
       <section class="_9eogI E3X2T">
          <main class="SCxLW  o64aR" role="main">
             <div class="v9tJq VfzDr">
                 <header class=" HVbuG">…</header>
                 <div class="_4bSq7">…</div>
                 <div class="fx7hk">…</div>
             </div>
          </main>
      </section>
    </body>

I do, it as

from bs4 import BeautifulSoup
import urllib.request as urllib2
html_page = urllib2.urlopen("https://www.instagram.com/cherrified_/?hl=en")
soup = BeautifulSoup(html_page,"lxml")
span_tag = soup.find('span') # return span-tag correctly
span_tag.find_all('div')    # return empty list, why ?

please also specify an example.

Upvotes: 0

Views: 438

Answers (1)

vekerdyb
vekerdyb

Reputation: 1263

Instagram is a Single Page Application powered by React, which means its source is just a simple "empty" page that loads JavaScript to dynamically generate the content in the browser after downloading.

Click "View source" or go to view-source:https://www.instagram.com/cherrified_/?hl=en in Chrome. This is the HTML you download with urllib.request.

You can see that there is a single <span> tag, which does not include a <div> tag. (Note: <div> inside a <span> is not allowed).

Scraping instagram.com this way is not possible. It also might not be legal (I am not a lawyer).

Notes:

  • your HTML code example doesn't include a closing tag for <span>.
  • your HTML code example doesn't match the link you provide in the python snippet.
  • in the last line of the python snippet you probably meant span_tag.find_all('div') (note the variable name and the singular 'div').

Upvotes: 1

Related Questions