Esteban Jimenez
Esteban Jimenez

Reputation: 17

How can I access this website's table and its contents?

I'm currently working on extracting specific data from a table in this website: http://bioinfo.life.hust.edu.cn/lncRNASNP/#!/lnc_disease . For this, I'm using Python and selenium.

My problem is that when I try to search for the table using read_html() from pandas, only the headers of the table are found, this is the output I get:

 lncRNA ID Chromosome   Disease    Pubmed   P-value Bonferroni   Variant     miRNA      Gain      Loss
0  No items   No items  No items  No items  No items   No items  No items  No items  No items  No items

And this is the code I used:

driver.get(link)    
df = pd.read_html(driver.page_source)[0]
print(df.head())
driver.close()

Also, if I try to access the source code for the website so I can extract the table directly from it (using BeautifulSoup and request libraries), it gives me a completely different html code than the one I see when I inspect an element directly from Chrome, and the output of the program says "No tables found".

Am I doing something wrong or is the table just inaccessible via these methods?

Upvotes: 1

Views: 142

Answers (1)

baduker
baduker

Reputation: 20052

There's a backend endpoint that serves the entire table as a JSON, so why not just grab this?

Here's how:

import pandas as pd
import requests
from tabulate import tabulate

disease_list_url = "http://bioinfo.life.hust.edu.cn/lncRNASNP/api/exp_disease_list"
response = requests.get(disease_list_url).json()
df = pd.DataFrame(response["lncrna_gene_list"])
print(tabulate(df))

Output:

--  -  --------  -----  ---------------  --        --  ---  ---  --  ---    --  --  ---  -------------------------  --  --  ---  -  --
 0  0  21961160  chr1   NONHSAT001955.2  45         3  347  526   3  602    49  32   94  Alzheimer's disease        14   0   80  0   2
 1  0  20124551  chr1   NONHSAT007671.2   4         0   87   79   2   82     0   9   22  autoimmune disease          3   0   19  0   0
 2  0  18982067  chr1   NONHSAT009623.2   0        22   62   92  15  101     0   0   72  AIDS                       11   0   61  0  17
 3  0  15478311  chr1   NONHSAT010193.2  35         0  227  325   0  326    44  26  173  affective disorders        41   0  132  0   0
 4  0  23791884  chr1   NONHSAT010193.2  35         0  227  325   0  326    44  26  173  Autism spectrum disorder   41   0  132  0   0
 5  0  19606485  chr1   NONHSAT010193.2  35         0  227  325   0  326    44  26  173  Autism spectrum disorder   41   0  132  0   0
 6  0  22817756  chr1   NONHSAT010193.2  35         0  227  325   0  326    44  26  173  Autism spectrum disorder   41   0  132  0   0
 7  0  22019903  chr11  NONHSAT017462.2  38         0  400  770   0  806    37  23  119  adrenocortical carcinomas  29  11   90  0   0
 8  0  21954592  chr11  NONHSAT017462.2  38         0  400  770   0  806    37  23  119  atherosclerosis            29  11   90  0   0
 9  0  22067257  chr11  NONHSAT017531.2  10         0   69  126   0  168    11   3   94  aging                      15   0   79  0   0
10  0  10340388  chr11  NONHSAT018662.2  18         0   81  140   0  130    13  11   84  acute myeloid leukemia     10   0   74  0   0
11  0  17940140  chr11  NONHSAT018662.2  18         0   81  140   0  130    13  11   84  acute myeloid leukemia     10   0   74  0   0
12  0  18587408  chr11  NONHSAT024403.2   0         7   73  113   5  170     0   0   49  Alzheimer's disease        10   0   39  0   9
13  0  21785702  chr11  NONHSAT024403.2   0         7   73  113   5  170     0   0   49  Alzheimer's disease        10   0   39  0   9
14  0  22817756  chr11  NONHSAT024403.2   0         7   73  113   5  170     0   0   49  Alzheimer's disease        10   0   39  0   9
--  -  --------  -----  ---------------  --        --  ---  ---  --  ---    --  --  ---  -------------------------  --  --  ---  -  --

And if you want the second page just do this:

disease_list_url = "http://bioinfo.life.hust.edu.cn/lncRNASNP/api/exp_disease_list?page=2"

Upvotes: 1

Related Questions