Tor Tor
Tor Tor

Reputation: 11

Getting text from website from a pandas python

I have a list of reaction names from which I want to make a Search in ModelSeed (basically "https://modelseed.org/biochem/reactions/" + reaction name). Then I want to know the KEGG pathway for the given name.

For instance, for the reaction "rxn00020", the function would go to https://modelseed.org/biochem/reactions/rxn00020 and from there give me "KEGG: rn00500 (Starch and sucrose metabolism)". I tried following this thread but didn't manage to get anything done... Can you help me? Thanks a lot!

Upvotes: 0

Views: 45

Answers (2)

ce.teuf
ce.teuf

Reputation: 786

Res contains what u want. Take a look to Network Tab from your Web Inspector.

The data you want, transits through XHR requests.

import requests as rq

reaction_names = ["rxn00020", ]
res = {}
base_url = "https://modelseed.org/solr/reactions/select?wt=json&q=id:"

for reac_name in reaction_names:
    resp = rq.get(base_url + reac_name).json()
    res[reac_name] = resp['response']['docs'][0]['pathways']

Upvotes: 0

Sushil
Sushil

Reputation: 5531

The page contents are loaded dynamically, so you have to use selenium in order to scrape them. Here is how you do it:

from selenium import webdriver
import time

driver = webdriver.Chrome()

urls = ['https://modelseed.org/biochem/reactions/rxn00020'] #List of all your urls

for url in urls:
    driver.get(url)
    time.sleep(1.5)
    kegg = driver.find_elements_by_class_name('ng-binding')[-2]
    print(kegg.text)

Output:

KEGG: rn00500 (Starch and sucrose metabolism)

Upvotes: 1

Related Questions