How to extract content from webpage as seen from browser using python

Question

I am trying to extract the data that is on this website "https://www.ncbi.nlm.nih.gov/nucleotide/209750423?report=genbank#". When I use urllib to extract the content, I am able to extract data that which I get by choosing 'view page source' after right-clicking on browser, but what I want is the actual sequence 'atggctgaga tgaaaaacct gaaaattgag gtggtgcgct ataacccgga....' to be extracted which is visible by right-clicking on browser and selecting 'inspect element' but not through 'view page source'

The code which I am using is

f = open('out.html', 'w') 
response = urllib.urlopen("https://www.ncbi.nlm.nih.gov/nucleotide/209750423?report=genbank")   
f.write(response.read())
f.close()

spectras · Accepted Answer

You should take the time to actually look at the page you want to scrape. It's just a page that loads some JS application. The application then loads the actual data from another place.

https://www.ncbi.nlm.nih.gov/sviewer/viewer.fcgi?val=209750423&db=nuccore&dopt=genbank&retmode=text

By the way, be sure to check copyright issues before scraping online content.

How to extract content from webpage as seen from browser using python

Answers (2)

Related Questions